PythonWA Meetup Feb - Rio Tinto Grad Program & DataFrame validation in Python


Details
Welcome to our February Python WA meetup!
We welcome Python enthusiasts of all skill levels. We'll be starting off with food and beverages at 5:30 pm, with talks beginning at 6pm.
Talk 1: Rio Tinto Graduate Program for Python Developers and possibilities for new starters in data science
by Vandana Sharma
Changing careers to tech, especially from a non-technical background, can be both exciting and overwhelming. This talk will dive deep into the experience of making that transition, offering practical advice and insights from the perspective of a graduate role in data science in the mining industry, focusing on the key dos and don’ts to ensure success.
We'll explore how Python is a powerful tool for data science, simplifying complex tasks and enabling deeper insights. The talk will also cover common challenges encountered along the way, including developing essential skills and adapting to new work environments for Python specialists at early stages of their career. Finally, we’ll share about endless possibilities of solving real-world problems with data.
Vandana is a graduate data scientist at Rio Tinto who transitioned into the field after gaining experience in industries such as manufacturing and retail. Passionate about problem-solving, she enjoys learning new things, traveling, and watching movies in her free time.
Talk 2: pd.read_csv is NOT all you need: DataFrame validation in Python
by Adam Graham
Many beginner data tradies dream of replacing the chaotic Excel workbooks of their forebears with elegant, reproducible Python workflows. However, real-world datasets rarely adhere to our modest assumptions, leading to frustration over inconsistent formats, unexpected null values, and non-ISO-compliant date columns.
In this short talk, aimed at the emerging data tradie, I will give a brief tour of the tools for *DataFrame validation* in Python: Pola.rs, Pandera, Pydantic, and even the humble pd.to_datetime() equip us with the means to tame unruly, manually curated spreadsheets and bring order to the Data Pipelines.
The GitHub repository for this talk can be found at: https://github.com/adamdoescode/PyValidationTalk
*Adam is an air quality data scientist with Environmental Technologies and Analytics. His experience in DataFrame validation comes from repeated attempts to beat wild Excel workbooks into conformity and handling a diverse cast of "quality controlled" datasets. In a past life he was a research biologist and science communicator. He remains a committed bird nerd and maintains a neobrutalist website at https://adamdoescode.github.io/.*
Thanks to our food and beverage sponsor: https://horizondigital.au/
and venue sponsor: https://spacecubed.com/

PythonWA Meetup Feb - Rio Tinto Grad Program & DataFrame validation in Python