Skip to content

Details

Description:

Apache Spark is a general framework for large-scale data processing on a cluster. If you need to analyze a data set larger than the memory of a single machine and need to parallelize your calculations to run them on a cluster and need less I/O overhead than Hadoop requires, Spark may be your solution. PySpark enables Spark users to write their code in Python and make use of its libraries. This class will be a hands-on introduction to Spark and PySpark where we explain the basic concepts necessary to get started using it through examples and exercises.

Audience:

For beginners to Spark and its APIs. A programming background and some experience with Python is assumed.

Speaker:

Meghann Agarwal

Code of Conduct:

PyLadiesATX is dedicated to providing a respectful, harassment-free community. Please read & follow our Code of Conduct: http://www.pyladies.com/CodeOfConduct/

If you would like to report an incident or contact our leadership team, please fill out this form (https://docs.google.com/forms/d/1D2imFi-DiClcPj4RyP7zRB9cUJlQ0B9sgZbK_kA8_0A/viewform). No identifying information needed.

Additional Info:

TBA

Related topics

Sponsors

Capital Factory

Capital Factory

Location sponsor

RetailMeNot

RetailMeNot

Pizza at Monthly Meetups!

Walmart Technology

Walmart Technology

Location and snacks sponsor.

GitHub

GitHub

Technical Sponsor for Mentoring

You may also like