Apache Spark
******************************************Spark APS*********************************************** In order to proces the data in haddop we need to convert the data in RDD (Resilient distributed Dataset) and then we can apply neccessary transformation and action on the RDD. The transformation is to process the data and the actions are used to process tha data. When the transfortaion is applied on the rdd, another RDD will be created, once the RDD is created we cannot odiyf it, but we can create another RDD using the provious RDD. Action : first() : this is use to get the first row for the RDD (Eg : orderItems.first()) take(n) : gets the first n row from the datasets collect() : it is used t convert the rdd into python collections, this is used when we have to apply the aps which is not present in spark-aps but in the python-aps. parallelize() : this is used to convert the collection into the RDD. When we read the data from the local file and if we want to proces...