June Tech Talk: PySpark


Details
Description:
Apache Spark is a general framework for large-scale data processing on a cluster. If you need to analyze a data set larger than the memory of a single machine and need to parallelize your calculations to run them on a cluster and need less I/O overhead than Hadoop requires, Spark may be your solution. PySpark enables Spark users to write their code in Python and make use of its libraries. This class will be a hands-on introduction to Spark and PySpark where we explain the basic concepts necessary to get started using it through examples and exercises.
Audience:
For beginners to Spark and its APIs. A programming background and some experience with Python is assumed.
Speaker:
Meghann Agarwal
Code of Conduct:
PyLadiesATX is dedicated to providing a respectful, harassment-free community. Please read & follow our Code of Conduct: http://www.pyladies.com/CodeOfConduct/
If you would like to report an incident or contact our leadership team, please fill out this form (https://docs.google.com/forms/d/1D2imFi-DiClcPj4RyP7zRB9cUJlQ0B9sgZbK_kA8_0A/viewform). No identifying information needed.
Additional Info:
TBA

Sponsors
June Tech Talk: PySpark