Data Cleaning with Python Pandas
Details
Do you work with tabular data? Learn how to clean, prepare, and organise datasets properly in Python.
Data Cleaning with Python Pandas
Working with real data means dealing with missing values, errors, duplicates, and inconsistent formats. Before any analysis or machine learning, data must be cleaned and prepared properly. Data cleaning is one of the most important and time-consuming tasks in data work.
This session gives a clear and practical introduction to data cleaning using Python and Pandas. It focuses on common real-world problems and shows simple, correct ways to fix them.
Who is this for?
Students, developers, and anyone who works with data and needs to clean and prepare datasets using Python. This session is useful if you work with messy files such as CSV or Excel, want to understand how Pandas handles missing or incorrect data, and want to build reliable data analysis pipelines.
Who is leading the session?
The session is led by Dr. Stelios Sotiriadis, CEO of Warestack and Associate Professor and MSc Programme Director at Birkbeck, University of London.
He works in data processing, distributed systems, cloud computing, and Python-based analytics. He holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018, he has been teaching at Birkbeck and founded Warestack in 2021.
What we will cover
This is a hands-on introduction with real examples and short exercises. Topics include loading data with Pandas, inspecting datasets, handling missing values, fixing data types, removing duplicates, cleaning text data, filtering and transforming columns, combining datasets, and common data cleaning mistakes to avoid.
Requirements
A laptop with Python installed (Windows, macOS, or Linux), Visual Studio Code, and Python pip. Lab computers can be used if needed.
Format
A 1.5-hour live session with short explanations, live coding, and guided exercises. The session runs in person, with streaming available for remote participants.
Prerequisites
Basic to intermediate Python knowledge, including functions, loops, and basic data structures. Some familiarity with Pandas is helpful but not required.
