Skip to content

Details

Meghann Agarwal, a data scientist at Curb, will be giving a hands-on workshop covering MapReduce, Hadoop, and Spark.

Apache Spark is a general framework for large-scale data processing on a cluster. If you need to analyze a data set larger than the memory of a single machine and need to parallelize your calculations to run them on a cluster and need less I/O overhead than Hadoop requires, Spark may be your solution. PySpark enables Spark users to write their code in Python and make use of its libraries. This class will be a hands-on introduction to Spark and PySpark where we explain the basic concepts necessary to get started using it through examples and exercises.

Audience: For beginners to Spark and its APIs. A programming background and some experience with Python is assumed.

In prep for the workshop, please sign up for Databricks Community Edition - http://go.databricks.com/free-trial ... Also I will be giving away t-shirts to the best questions from the audience. Hope to see you there.

Members are also interested in