Skip to content

Details

Hello Python enthusiasts and models both large and small,

The 34th PyData meetup will take place at Similarweb offices (Dock in Five, reception A, 5th floor). As usual, the talks will start at 18:30 but we encourage you to come as soon as 18:00 to enjoy the opportunity to socialize and refresh yourselves (which you can continue doing during the break and after the talks).

Our main goal is to build the community around Python and data and make it welcoming to people of various skills and experience levels.

⚡ If you are interested in giving a lightning talk (up to 5 minutes to present an idea, tool or results related at least to some degree to Python and/or data), please contact us before the event or at its beginning.

Why Your Models Fail: Learning from Regime Shifts in Real Data
(Alena Pavlova, Similarweb)
When building predictive models, we often assume that the data-generating process is stable over time. In practice, this assumption is frequently violated and ignoring it can severely degrade model performance.
In this talk, we use stock market price data to demonstrate how overlooking market regimes leads to misleading results and fragile trading strategies. We start by implementing a simple strategy under the assumption of a single, stationary regime and show how it fails in changing market conditions.
We then introduce a regime-aware approach to infer latent market states. By applying the same strategy only within appropriate regimes, we demonstrate how performance, risk characteristics, and interpretability can change dramatically.
The talk focuses on practical Python implementation, intuitive explanations of regime modeling, and concrete lessons that extend beyond finance to any non-stationary time-series problem. Attendees will learn how to detect regime shifts, incorporate them into modeling workflows, and avoid common pitfalls when working with evolving data.

Training Small Language Models with Knowledge Distillation
(Gabriela Kadlecová, distil labs)
Large language models (LLMs) have proven useful across a wide range of real-world applications, from code generation and document analysis to intelligent automation and data extraction. However, their practical adoption comes with some tradeoffs: inference costs can be prohibitive at scale, and sending sensitive data to external cloud APIs raises privacy concerns. Sometimes, a general-purpose model is more than what's needed - if the task is quite specific, a smaller, specialized model could do the job just as well.
Small Language Models (SLMs) are a great alternative - running entirely on local hardware, they keep data private, respond faster, and operate at a fraction of the cost. The key challenge is making them good enough for the task at hand, and this is where knowledge distillation comes in. By having a large "teacher" model automatically generate and refine training examples, we can create a specialized "student" model tailored to a specific task, starting from as few as a handful of seed examples.
In this talk, we walk through the full distillation pipeline: from defining a task and preparing seed data, through synthetic data generation, to fine-tuning an SLM ready for local deployment. We also present benchmark results comparing base and fine-tuned models, and a live demo showing how capable a well-distilled small model can be.

Related topics

You may also like