Delta Lake and MLflow

Vienna Data Science Tools
Vienna Data Science Tools
Public group

Online event

This event has passed


As usual, we will have two talks, but this time we'll have only one speaker, and the meetup will be purely online. The YouTube streaming URL will be provided here shortly before the start of the meetup.

Databricks Solutions Architect Stephan Wernz will talk about two relatively new tools open sourced by the creators of Apache Spark at Databricks: Delta Lake and MLflow. Here are the detailed abstracts:

#### Building reliable Data Lakes with Delta Lake

The widespread adoption of Apache Spark™, the first unified analytics engine, has helped data professionals make great strides in data science and machine learning. Yet, their upstream data lakes still face reliability challenges when it comes to building production data pipelines at scale to power these initiatives.

Delta Lake is an open source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing. Delta Lake runs on top of existing data lakes, such as on Azure Data Lake Storage, AWS S3, Hadoop HDFS, or on-premise, and is fully compatible with Apache Spark APIs.

In this talk we will describe the basic principles of the Delta Lake open source project and demonstrate how to build highly scalable and reliable data pipelines using Delta Lake.

#### Model management with MLFlow and Multi-Framework Pipelines

ML development brings many new challenges beyond the traditional software development lifecycle: for example, ML developers try a lot of algorithms, tools, and parameters to get the best results, and they need to track all this information to reproduce them.

The MLflow project simplifies the whole ML lifecycle by introducing simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

In this talk, we will show how MLflow can be used to track model parameters and metrics from experiments, package the model to reproduce the runs and finally put the model in a general format for deployment by building a custom model combining multiple frameworks.

About the speaker:
Stephan is a Solutions Architect at Databricks in Munich helping customers to derive value out of their data in the DACH region. After completing his Msc. in Statistics at HU-Berlin he was employed as data-science consultant, where he applied machine-learning to a broad range of use-cases within marketing and sales among various industries. Before joining Databricks he also worked in big-data analytics in retail for the Schwarz-Group.


18:30 - Hang out virtually in the YouTube chat, bring your own snacks and beverages :)

19:00 - "Building reliable Data Lakes with Delta Lake" (Stephan Wernz)

19:45 - "Model management with MLFlow and Multi-Framework Pipelines" (Stephan Wernz)

20:30 - Q&A with the speaker, announcements, more virtual hanging out if desired.

Looking forward to see you online!

This event is sponsored by NOVOMATIC

NOVOMATIC is the leading provider of gaming technology and casino equipment in Europe. As such we are constantly applying cutting edge technologies to develop the most innovative solutions in the industry. Sponsoring the Vienna Data Science Tools Meetup is part of our effort to contribute to the creation of a strong Data Science and Machine Learning community in and around Vienna.