Skip to content

PySpark Workshop by Meghann Agarwal

Photo of Brandon C. Loudermilk
Hosted By
Brandon C. L.
PySpark Workshop by Meghann Agarwal

Details

Meghann Agarwal, a data scientist at Curb, will be giving a hands-on workshop covering MapReduce, Hadoop, and Spark.

Apache Spark is a general framework for large-scale data processing on a cluster. If you need to analyze a data set larger than the memory of a single machine and need to parallelize your calculations to run them on a cluster and need less I/O overhead than Hadoop requires, Spark may be your solution. PySpark enables Spark users to write their code in Python and make use of its libraries. This class will be a hands-on introduction to Spark and PySpark where we explain the basic concepts necessary to get started using it through examples and exercises.

Audience: For beginners to Spark and its APIs. A programming background and some experience with Python is assumed.

In prep for the workshop, please sign up for Databricks Community Edition - http://go.databricks.com/free-trial ... Also I will be giving away t-shirts to the best questions from the audience. Hope to see you there.

Photo of San Antonio Data Science group
San Antonio Data Science
See more events
Geekdom Event Center
131 Soledad Street · San Antonio, TX