Learning center

Posts

Apache Spark

November 30, 2018

******************************************Spark APS*********************************************** In order to proces the data in haddop we need to convert the data in RDD (Resilient distributed Dataset) and then we can apply neccessary transformation and action on the RDD. The transformation is to process the data and the actions are used to process tha data. When the transfortaion is applied on the rdd, another RDD will be created, once the RDD is created we cannot odiyf it, but we can create another RDD using the provious RDD. Action : first() : this is use to get the first row for the RDD (Eg : orderItems.first()) take(n) : gets the first n row from the datasets collect() : it is used t convert the rdd into python collections, this is used when we have to apply the aps which is not present in spark-aps but in the python-aps. parallelize() : this is used to convert the collection into the RDD. When we read the data from the local file and if we want to proces...

Apache Sqoop

November 25, 2018

INTRODUCTION: In today’s world there are tons of data generated from different source, for example from the social networking site, e-commerce website, sensors etc. Many companies store these data to analyze these data and get some insight from these processed data to improve their business. To process these set of data we need an environment, which can help in processing huge amount of data. Hadoop is a kind of an environment where the huge amount of data can be processed. Hadoop is a kind of a frame-work application which provide distributed storage and help in computing the data across different cluster of computers. Now we know that we can get faster computation of these huge data in Hadoop environment, now the next challenge is getting the data from outside world to Hadoop environment. This can be done by many tools like Apache Flumes, Apache Kafka , Apache Sqoop. In this blog I will go through some of the concepts and operation that we use in Sqoop. ...

Search This Blog

Learning center

Posts

sqoop file format

Apache Spark

Apache Sqoop