Data Pre-Processing Best Practices with Pyladies Boston & Women in Data


Details
Join Fay Shaw for a practical, hands-on session focused on data preprocessing — an essential step in any data science or machine learning workflow. Whether you’re new to data science or looking to streamline your current process, this session will introduce actionable tools, techniques, and tips to help you clean, transform, and prepare your data effectively.
Fay will walk through a real-world example using data from Belmin et al.’s 2022 Nature paper, LivWell: a sub-national dataset on the living conditions of women and their well-being for 52 countries. This dataset includes 265 indicators across 447 regions in 52 countries, offering a rich foundation for practical demonstration.
Participants will learn how to:
- Explore and filter data using DataFrame functions
- Clean and manipulate strings
- Apply common transformation techniques
- Structure data for downstream analysis or machine learning
Prerequisites: Basic familiarity with Python and working in Colab or Jupyter notebooks.
Link to materials: https://github.com/fayshaw/data_preprocessing

Data Pre-Processing Best Practices with Pyladies Boston & Women in Data