PySpark Workshop by Meghann Agarwal


Details
Meghann Agarwal, a data scientist at Curb, will be giving a hands-on workshop covering MapReduce, Hadoop, and Spark.
Apache Spark is a general framework for large-scale data processing on a cluster. If you need to analyze a data set larger than the memory of a single machine and need to parallelize your calculations to run them on a cluster and need less I/O overhead than Hadoop requires, Spark may be your solution. PySpark enables Spark users to write their code in Python and make use of its libraries. This class will be a hands-on introduction to Spark and PySpark where we explain the basic concepts necessary to get started using it through examples and exercises.
Audience: For beginners to Spark and its APIs. A programming background and some experience with Python is assumed.
In prep for the workshop, please sign up for Databricks Community Edition - http://go.databricks.com/free-trial ... Also I will be giving away t-shirts to the best questions from the audience. Hope to see you there.

PySpark Workshop by Meghann Agarwal