PyData 2022 August


Details
Welcome to the August Pydata Berlin edition!!
For everybody to feel safer, we recommend you to test yourself against COVID-19 before coming to the event. A self-test or a rapid antigen test would suffice. And please refrain from coming to the event if you feel unwell.
***
Talks:
Morena Bastiaansen
Data leakage: a silent killer in real-time machine learning.
Data leakage is a classic pitfall in machine learning. It is a sneaky and subtle issue that can have a major business impact. While your model performance looks completely fine in the testing and validation phase, after deployment your model can turn out to be worthless in production. This talk covers the definition of data leakage, why it matters to real-time models specifically, and strategies to detect and avoid data leakage.
Bio
Besides her work as a data scientist at GetYourGuide, Morena enjoys sharing knowledge within the field on how to build and leverage machine learning models in the most powerful way. She is also a co-organizer of the MLOps Community meetups in Berlin.
Break
Kevin Klein
Datajudge - Express and test specifications against data from database.
Ensuring data quality is of great importance for many use cases and Datajudge seeks to make this convenient. Datajudge is a library which allows for the expression of expectations held against data stored in databases. In particular, it allows for comparing different data sources, a feature missing from popular alternatives. Yet, it also comes with functionalities to compare data from a single data source to fixed reference values derived from explicit domain knowledge. Not trying to reinvent the wheel, datajudge relies on pytest to execute the data expectations.
https://github.com/Quantco/datajudge
[https://tech.quantco.com/2022/06/20/datajudge.html](https://github.com/Quantco/datajudge)
Bio
Kevin sits between Data Science and Machine Learning Engineering at QuantCo, working on fraud detection and pricing. Prior to that, he did research on Bayesian Optimization and Natural Language Processing at ETH in Zurich.
***
NumFOCUS Code of Conduct
THE SHORT VERSION
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS.
All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate.
NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form.
Thank you for helping make this a welcoming, friendly community for all.
If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct
***
SPONSORS:
BackMarket
Founded in 2014 in France, Back Market is the world's first online marketplace dedicated exclusively to second-hand devices to allow the consumer to enjoy a shopping experience with the associated advantages one has when buying new, but at a cheaper price.
- Website: https://www.backmarket.com/
- Jobs: https://jobs.backmarket.com/
- LinkedIn: https://www.linkedin.com/company/back-market/
Jina AI
Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g., text, images, video, audio) on the cloud.
- Website: https://jina.ai
- Github: https://github.com/jina-ai/jina
- Twitter: https://twitter.com/JinaAI_
- LinkedIn: https://www.linkedin.com/company/jinaai/mycompany/
Ahoy Berlin coworking space
https://www.ahoyberlin.com/
***
How to find us:
The meetup happens at Ahoy coworking space (https://www.ahoyberlin.com) and you can find us on 1.OG right side.
COVID-19 safety measures

PyData 2022 August