Skip to content

Details

Modern data analytics often involves large datasets that do not fit into memory or are too slow to process with traditional tools. Apache Spark is one of the most widely used frameworks for scalable data processing and analytics.

This session gives a clear and practical introduction to data analytics using Apache Spark and Python. It focuses on understanding how Spark works and how to use it to analyse large datasets efficiently.

Who is this for?

Students, developers, and anyone who works with data and wants to analyse large datasets using Python. This session is useful if you have outgrown Pandas, work with big CSV or log files, or want to learn how modern data analytics systems handle scale.

Who is leading the session?

The session is led by Dr. Stelios Sotiriadis, CEO of Warestack and Associate Professor and MSc Programme Director at Birkbeck, University of London. He works in big data systems, distributed computing, cloud platforms, and Python-based data analytics. He holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018, he has been teaching at Birkbeck and founded Warestack in 2021.

What we will cover

This is a hands-on introduction with practical examples and short exercises. Topics include loading data into Spark, understanding DataFrames, basic transformations and actions, filtering and aggregations, grouping and joins, using Spark SQL, and understanding when and why Spark is better than Pandas.

Requirements

A laptop with Python installed (Windows, macOS, or Linux), Visual Studio Code, and Python pip. Spark will be provided via local setup or lab environment. Lab computers can be used if needed.

Format

A 1.5-hour live session with short explanations, live coding, and guided exercises. The session runs in person, with streaming available for remote participants.

Prerequisites

Basic to intermediate Python knowledge, including functions, loops, and basic data structures. Prior experience with Pandas is helpful but not required.

Artificial Intelligence
Courses and Workshops
Data Science
Python
Computer Programming

Members are also interested in