Data Engineering Berlin is a meetup series focussed on knowledge sharing and learning for the local data engineering community. Recordings of previous sessions can be found on our Youtube channel: https://www.youtube.com/channel/UCxwul7aBm2LybbpKGbCOYNA
For our 7th installment of the series we have a special guest in Matei Zaharia. Matei is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009. Today, Matei tech-leads the MLflow development effort at Databricks.
18:00 Doors Open: drinks, and discussions
18:30 Key note - Matei Zaharia, Simplifying Production Machine Learning with MLflow
19:10 Jan van der Vegt, Dynamically generating schemas from arbitrary objects
19:30 Break: food, drinks, and discussions
20:00 Felix Bert / Daniel Germanus, Predictive maintenance and condition monitoring for remote heavy machinery
20:30 Thomas Santana / Iaroslav Fadin, k-anonymization or why aggregation is not enough
21:45 Event End
Key Note - Matei Zaharia, Databricks
Simplifying Production Machine Learning with MLflow
Building and deploying a machine learning model can be difficult to do once. Enabling other engineers (or yourself, one month later) to reproduce your pipeline, to compare the results of different versions, to track what’s running where, and to redeploy and rollback updated models is much harder. In this talk, I’ll introduce MLflow, a new open source project launched by Databricks that simplifies the machine learning lifecycle.
Jan van der Vegt, Cubonacci
Dynamically generating schemas from arbitrary objects
Understanding the DNA of the data that an application is processing can be essential. Knowing the corresponding schemas can help with dynamically generating API contracts or learning meta models like anomaly detection. Instead of forcing users to supply a schema, it's possible to inspect the data and dynamically generate schemas.
In this talk I will first show the concept behind how Cubonacci generates schemas dynamically from Python objects that our users pass to us. After that I will show how this specific approach scales well within standard data engineering frameworks.
Felix Bert / Daniel Germanus, Deutsche Bahn
Predictive maintenance and condition monitoring for remote heavy machinery
Predictive maintenance and condition monitoring for remote heavy machinery are compelling endeavors to reduce maintenance cost and increase availability. This work presents a condition monitoring platform built entirely from open-source software. A real world industry example for an escalator use case from Deutsche Bahn underlines the advantages of this approach.
This talk highlights the challenges and learnings involved in building the platform and high-level aggregation for our alarming system.
Thomas Santana / Iaroslav Fadin, Zalando
k-anonymization or why aggregation is not enough
Sharing data with partners is integral to Zalando strategy, however this must be done respecting customer privacy. Aggregation is not enough to ensure privacy. This talk will show how Consumer Insight implemented k-anonymization to allow sharing data with partners while protecting customer data. We discuss what it is, how we implemented it and the challenges of k-anonymization.
Zalando Code of Conduct:
Zalando is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, nationality, cultural background, religion or lack thereof. We do not tolerate harassment of attendees in any form. Offensive and sexual language and imagery is not welcome at our events. Participants violating these rules may be asked to leave at the discretion of the event organisers.