PyData Berlin 2025 May Meetup


Details
Welcome to the PyData Berlin May meetup!
We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time!
Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.
Host: Ecosia is excited to welcome you to this month's version of PyData.
Entrance is in Hof 4 - there will be signs - then up to the 3rd floor of the building.
**************************************************************************
The Lineup for the evening
Talk 1: Specializing Small Language Models With Less Data
Abstract: I will present a practical, end-to-end solution for training SLMs using synthetic data, covering key aspects from data curation through training to model evaluation. You will leave with concrete strategies for building efficient, domain-specific language models for production environments.
Most AI teams are exploring the possibilities of LLMs rather than being focused on margins, but soon, efficiency will become important. Small, specialized language models (SLMs) offer a promising alternative, but training them requires extensive manually-labeled datasets - a significant engineering bottleneck.
In this talk, I will discuss how large language models can be used to help generate and curate the data needed for SLM training. Using extractive question answering as a case study, We'll examine how this approach can dramatically reduce data collection time while maintaining model performance.
Speaker: Jacek Golebiowski
Bio: Jacek is the CTO of distil labs, building specialised AI agents that can be deployed on-device/on-prem with minimal data. Before that, he was a machine learning team lead at AWS, focused on Automated ML and natural language processing. He holds a PhD in Machine Learning for Quantum Mechanics from Imperial College London.
---
Talk 2: Exploring fairlearn and practical strategies for assessing and mitigating harm in AI systems
Abstract: As AI becomes a more significant part of our everyday lives, ensuring these systems are fair is more important than ever. In this session, we’ll discuss how to define fairness and the potential harms our algorithms can have on people and society. We'll introduce fairlearn, a community-driven, open-source project that offers practical tools for assessing and mitigating harm in AI systems. We’ll also explore how to discuss bias, different types of harm, the idea of group fairness and how they all relate to fairlearn's toolkit. To make it all concrete, we’ll walk through a real-world example of assessing fairness and share some hands-on strategies you can use to mitigate harm in your own ML projects.
Speaker: Tamara Atanasoska
Bio: Tamara is a software engineer, OSS contributor and maintainer and NLP researcher.
---
Lightning talks
There will be slots for 2-3 Lightning Talks (3-5 Minutes for each).
Kindly let us know if you would like to present something at the start of the meetup :)
***
NumFOCUS Code of Conduct
THE SHORT VERSION
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS.
All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate.
NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form.
Thank you for helping make this a welcoming, friendly community for all.
If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct
***

PyData Berlin 2025 May Meetup