Munich Datageeks October Edition


Details
We are incredibly happy to announce our next Meetup on October 26th at Norcom.
Format:
- 2 talks (each ca. 40 min incl. discussion)
- Time for networking + food + drinks before, in between and after the presentations
- Talks are held in English
- We will be taking photos and/or film footage at the event. These will be used to share news about our meetups and to publicize upcoming events.
The lineup:
Talk 1: Jan Hauffa - A Case Study on Retrieval-Augmented Generation for Document Q&A: Experiences and Future Perspectives
Abstract:
Neural language models based on the Transformer architecture have been successfully applied to a wide range of Natural Language Generation tasks, but are held back by their limited context length, that is, their inability to simultaneously “pay attention” to all parts of a long document. Retrieval-Augmented Generation (RAG) is currently the most promising approach to overcome this limitation. By means of semantic similarity search, one can identify the parts of a document that are most relevant to the task at hand, and use only those parts as input to the language model.
In this talk, I demonstrate how RAG can be used to build a system for answering arbitrary questions, posed in natural language, about the content of documents (“document Q&A”). I discuss the challenges we faced when implementing document Q&A at NorCom, how to improve the performance of a document Q&A system, and how to reliably measure the performance in the first place.
Bio:
Jan Hauffa obtained a Diploma degree in Computer Science at Technical University of Munich and wrote his doctoral dissertation on the subject of detecting influence relationships in large-scale social networks. He currently works as a Data Scientist at NorCom, where he keeps trying to make the chaos and complexity of human language amenable to computational processing.
Talk 2: Thomas Schmidt - Revolutionizing SQL Data Model Testing: Introducing SQL-Mock by DeepL
Abstract
Picture this: you've just committed code to a SQL model, and suddenly, there it is – a devastating bug wreaking havoc in production. We've all been there. You might be thinking, “Yeah, tests could've prevented this, but the testing solutions available just don't cut it.” Well, we've got exciting news for you!
Join us at the Munich Data Geeks Meetup for a session that dives deep into the world of SQL testing. We'll explore the existing solutions, shed light on their limitations, and unveil a game-changing solution – SQL-Mock, a library soon to be open-sourced in a public beta by DeepL.
SQL-Mock is set to revolutionize the way you test SQL queries. With Python, you can effortlessly mock input data and create tests for various scenarios. In this session, we'll take you on a thrilling journey through SQL-Mock's capabilities. We'll get hands-on, showing you how to define mock tables, create test data, and generate query results.
Say goodbye to the frustration of SQL bugs in production and hello to efficient and reliable testing. Don't miss this opportunity to elevate your data engineering and analysis game. Join us and be part of the SQL-Mock revolution!
Bio:
Thomas currently works as a Senior Data Scientist at DeepL.
He started his data science journey while studying Agricultural Science at the TUM, where he worked on an early disease identification model for calves on automatic feeders. He spent the first years of his Data Science career working closely with software engineers, adopting software engineering best practices (including data testing) while building the data stack and team for a Munich Startup. After that he worked at Shopify and found his way to DeepL where he built SQL-Mock (the library we are going to talk about today ) on a Hack Friday, to solve the issue of missing SQL tests.
COVID-19 safety measures

Munich Datageeks October Edition