An Introduction to PySpark
Details
What is PySpark? Can it solve all of my data problems? Are you sure I can’t just use pandas instead?
This talk will aim to answer at least some of the questions you may have if you’re starting out with or looking to scale up your use of PySpark when working with data. We’ll cover some code examples, an introduction to what goes on behind the API (and why you should care), as well as some common issues I’ve run into so that you can hopefully spend slightly less time getting frustrated by them.
I’ll try to keep the talk a mix of useful intro hints mixed with more technical backing. So whether you’re hoping to ace your next data engineering interview or are just curious about what might be going on in the world of data, you’ll hopefully leave this talk feeling a bit more confident in your next steps with big data.
---
Alex spent two and a half years in the APS using PySpark to process, validate and link datasets covering tens of millions of individuals. She's previously tutored six different computer science courses at ANU, including Data Mining, and currently works as a Software Engineer at Geoscape Australia. She's particularly interested in all the pre-processing and engineering that goes into data prior to being used by researchers for stuff like machine learning.
---
Pizza and drinks provided.
Huge thanks to our sponsors: Linux Australia, Xero, Reposit Power and ANU
