Skip to content

Details

📆 Agenda

  • 19:00 Welcome to the PUB (Python Users Berlin) – setting up
  • 19:15 Main talk
  • 20:00 Lightning talks
  • 20:30 Social gathering

🎙 Main talk by Sam Bail: PySpark
PySpark is a powerful library that brings Apache Spark’s distributed computing capabilities to Python, making it a key tool for processing large-scale data efficiently. In this talk, data engineer and analyst Sam Bail provides a structured and hands-on introduction to PySpark, starting with an overview of Apache Spark, its architecture, and its ecosystem, such as Databricks. Learn about Spark’s core concepts, such as the DataFrame API, transformations, lazy evaluations, and actions, before setting up a lab environment and working with a real dataset. Plus, gain insights into how PySpark fits into a broader data engineering ecosystem and best practices on running PySpark in a production environment.

👩‍💻 About Sam Bail
Data engineer and engineering leader with 10+ years building platforms and teams across healthcare, marketplaces, and data infrastructure at NYC tech startups. I teach data engineering courses on LinkedIn Learning and O’Reilly, focused on making complex topics accessible to everyone. I also founded Bright Nights Social, an alcohol-free nightlife community that’s produced 100+ events across NYC. You’ll probably find me on a dance floor or running around Berlin this summer (training for the Berlin marathon).

📚 Resources

⚡️ Lightning talks
We would like you to give a lightning talk (shorter than 10 minutes) about what you are doing with Python.

📍This will be a face-to-face meeting.

Related topics

Events in Berlin
Berlin
Apache Spark
Data Science
Python
Computer Programming

You may also like