PyData Karlsruhe - Coding at Scale | Fast DataFrames


Details
DataScience and AI: in person in Karlsruhe and live on PyData.TV on YouTube
Agenda
18:00 Doors open
18:30 Welcome
18:45 Kickstart Coding at Scale – How Project Template Automation unlocks Developer Productivity - Adrian Freund, Pavel Zwerschke, Bela Stoyan (QuantCo)
19:15 Break: Networking with snacks and beverages
20:15 Dask DataFrame is fast now - Florian Jetter (Coiled)
20:45 Lightning Talks
21:00 Networking with snacks and beverages
21:30 End
⚡️ Lightning Talks:
1. Tim Berti - A case study of custom kernels
2. Natalia Mokeeva - Find the best strategy to get involved in Open Source
3. Dr. Lisa A. Chalaguine - Legal Argument Mining from Court Decisions - A Flyby
Asking questions:
Please go to Slido to ask questions.
How to sign up for on site
It's important for us to make this meet up happen in a responsible way. We have limited seats available only.
No limits to sign up remotely!
How to join remotely
Join the live stream on YouTube.
This event will be in English.
----
Talk #1
Kickstart Coding at Scale: How Project Template Automation Unlocks Developer Productivity – Adrian Freund, Pavel Zwerschke, Bela Stoyan (QuantCo)
As your company grows, often so does your software landscape. Setting up each repository from scratch quickly leads to a fragmented ocean of project and ci setups that basically solve the same problems. For this, we came up with a standardized customizable internal project template that we use for all new projects. To operate this effectively at scale, we also came up with a solution to automatically migrate existing projects to newer versions of the template. This lets our developers focus on what they do best, writing code, and not getting stuck on boilerplate while still giving them the option to later deviate from the standard setup if needed.
Adrian Freund is a working student in developer tools and is currently studying computer science at Karlsruhe Institute of Technology. One of his most popular OSS contributions is the support for the match statement in mypy.
Pavel Zwerschke is a Data Engineer focused on building platforms for Data Science development. As part of his work, he is working on packaging topics, especially the adoption of pixi in the wider scientific Python community.
Bela Stoyan is technical staff working on the intersection of Data Science and MLOps. He enables Data Scientist to write data pipelines and helps them to bring them to production systems.
Talk #2
Dask DataFrame is fast now - Florian Jetter (Coiled)
Dask is a library for distributed computing with Python that integrates tightly with pandas. Historically, Dask was the easiest choice to use (it's just pandas) but struggled to achieve robust performance (there were many ways to accidentally perform poorly). The re-implementation of the DataFrame API addresses all of the pain points that users ran into. We will look into how Dask is a lot faster now, how it performs on benchmarks that is struggled with in the past and how it compares to other tools like Spark, DuckDB and Polars.
Florian Jetter is leading the Dask Engineering team at Coiled Computing. He is a long term dask core maintainer and is an expert in distributed cloud computing and data storage
----
Acknowledgements
Also a big thank you to our sponsors:
- QuantCo, for hosting the meetup.
- PIONEERS HUB, for organising.
Contact
If you have any questions or suggestions, please feel free to contact us via:
- Meetup
- Want to speak? Submit a talk here.
- Interested in hosting an event? Here's our Info-Deck & contact to the organisers!

Sponsors
PyData Karlsruhe - Coding at Scale | Fast DataFrames