PyData Berlin 2026 May Meetup
Details
Welcome to the PyData Berlin May meetup!
We would like to welcome you all starting from 18:00. There will be food and drinks. The talks begin around 18.30 and the doors will close at 18:45. Make sure to arrive on time!
Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.
Host:
GetYourGuide is excited to welcome you to this month's version of PyData.
**************************************************************************
The Lineup for the evening
Talk 1: Text2SQL in the Wild — Agentic Workflows & Semantic Models on Customer Data
Abstract: Translating a business user's natural language question into a reliable SQL query has huge value for business analysts, but doing it accurately at scale, on messy enterprise data, for complex queries, is incredibly difficult. This talk walks through a production Text2SQL system built on four pillars: semantic models built on customer data, RAG-based semantic search, an agentic verification and correction loop for building the SQL, and an evaluation framework for the entire process.
Raw schema information is not enough for text2sql to work. Rather, thorough documentation including business metrics, synonyms, and qualified example queries, are required. We'll cover how to automate this documentation pipeline by mining query logs, generating descriptions with LLMs, and organizing everything into a knowledge graph and vector search index. From there, we'll dive into the agentic flow that takes a user question and iteratively generates, validates, and fixes SQL against a live database. The agentic framework analyses syntax errors and execution results to help it fix the SQL. Participants will leave with practical takeaways on RAG architecture for structured data, agentic retry patterns, and how to think about evaluation for Text2SQL systems in production.
Speaker: Oren Matar
Bio: Oren Matar is a principal data scientist and algorithms developer, with a background in social sciences and Bayesian methods. He specializes in NLP and time series forecasting, as well as agentic methods for data retrieval and processing.
Talk 2: Interpreting and Communicating Statistical Models
Abstract: Extracting actionable insights from complex statistical models remains a challenge, as raw coefficients are often uninterpretable due to nonlinearity, interactions, or hierarchical structures. This talk introduces a unified framework for model interpretation based on the principles of the Marginal Effects project (https://marginaleffects.com/). We move away from internal model parameters toward quantities of interest, such as marginal effects and adjusted predictions that translate statistical output into the "natural language" of the data.
We will demonstrate these concepts in Python using the Bambi interpret module, showcasing a seamless workflow for interpreting Bayesian models built on PyMC. The audience will learn how to use the four primary interpretative pillars: Predictions, Comparisons, Slopes, and Marginal Means to compute average marginal effects and visualize conditional relationships. By the end of this session, you will be equipped to turn sophisticated GLMs and multi-level models into clear, rigorous narratives that are easily communicated to stakeholders.
Speaker: Juan Orduz
Bio: Juan is a Mathematician (Ph.D. Humboldt Universität zu Berlin) and Principal Data Scientist at PyMC Labs. He is interested in interdisciplinary applications of mathematical methods. In particular, time series analysis, Bayesian methods, and causal inference.
Lightning talks
There will be slots for 2-3 Lightning Talks (3-5 Minutes for each) between the two main talks.
Kindly let us know if you would like to present something :)
***
NumFOCUS Code of Conduct
THE SHORT VERSION
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS.
All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate.
NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form.
Thank you for helping make this a welcoming, friendly community for all.
If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct
***
