

What we’re about
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
The PyData Code of Conduct governs this meetup. To discuss any issues or concerns relating to the code of conduct or the behavior of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (+1 512-222-5449; [leah@numfocus.org](mailto:leah@numfocus.org)) or the group organizer.
We run monthly meetups at changing locations and have organized six conferences, in 2014, 2015, 2016, 2017, 2019, and 2022. You can see our latest meetups, submit a talk idea and read PyData blog posts on our site : https://berlin.pydata.org.
Please get in touch using info@pydata.berlin.
Twitter: @pydataberlin
Upcoming events (2)
See all- PyData Berlin 2025 May MeetupEcosia, Berlin
Welcome to the PyData Berlin May meetup!
We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time!
Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.
Host: Ecosia is excited to welcome you to this month's version of PyData.
Entrance is in Hof 4 - there will be signs - then up to the 3rd floor of the building.
**************************************************************************
The Lineup for the evening
Talk 1: Specializing Small Language Models With Less Data
Abstract: I will present a practical, end-to-end solution for training SLMs using synthetic data, covering key aspects from data curation through training to model evaluation. You will leave with concrete strategies for building efficient, domain-specific language models for production environments.
Most AI teams are exploring the possibilities of LLMs rather than being focused on margins, but soon, efficiency will become important. Small, specialized language models (SLMs) offer a promising alternative, but training them requires extensive manually-labeled datasets - a significant engineering bottleneck.
In this talk, I will discuss how large language models can be used to help generate and curate the data needed for SLM training. Using extractive question answering as a case study, We'll examine how this approach can dramatically reduce data collection time while maintaining model performance.Speaker: Jacek Golebiowski
Bio: Jacek is the CTO of distil labs, building specialised AI agents that can be deployed on-device/on-prem with minimal data. Before that, he was a machine learning team lead at AWS, focused on Automated ML and natural language processing. He holds a PhD in Machine Learning for Quantum Mechanics from Imperial College London.---
Talk 2: Exploring fairlearn and practical strategies for assessing and mitigating harm in AI systems
Abstract: As AI becomes a more significant part of our everyday lives, ensuring these systems are fair is more important than ever. In this session, we’ll discuss how to define fairness and the potential harms our algorithms can have on people and society. We'll introduce fairlearn, a community-driven, open-source project that offers practical tools for assessing and mitigating harm in AI systems. We’ll also explore how to discuss bias, different types of harm, the idea of group fairness and how they all relate to fairlearn's toolkit. To make it all concrete, we’ll walk through a real-world example of assessing fairness and share some hands-on strategies you can use to mitigate harm in your own ML projects.Speaker: Tamara Atanasoska
Bio: Tamara is a software engineer, OSS contributor and maintainer and NLP researcher.---
Lightning talks
There will be slots for 2-3 Lightning Talks (3-5 Minutes for each).
Kindly let us know if you would like to present something at the start of the meetup :)***
NumFOCUS Code of ConductTHE SHORT VERSION
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS.
All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate.
NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form.
Thank you for helping make this a welcoming, friendly community for all.If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct
*** - Network event373 attendees from 136 groups hostingPyData London 2025Needs location
Get ready to unleash your inner data aficionado at PyData London 2025, happening June 6-8 at Convene Sancroft, St. Paul’s! This three-day, in-person event is your golden ticket to dive into live keynotes, talks, and lightning sessions alongside fellow data enthusiasts.
Have an idea you want to share? Submit your talk proposal by Feb. 24. Tickets sold out in 2024, so don’t wait—grab yours today! Buy Here
Past events (130)
See all- Network event114 attendees from 131 groups hostingPyData Virginia 2025This event has passed