Worum es bei uns geht

We are a bunch of people passionate about Data Science and about the modern tools available today that make you more productive, allow you to tackle previously inaccessible tasks and generally increase the fun factor in your Data Science projects.

We originally started as the "Vienna Spark Meetup", but after some time we realized the that while Spark is a great tool for Big Data problems, there are many small and medium size data problems that are better accessible with other tools.

We get together because we enjoy sharing our knowledge and learning about interesting new tools, Data Science problems and approaches to solve them.

Bevorstehende Events (1)

Delta Lake and MLflow


As usual, we will have two talks, but this time we'll have only one speaker, and the meetup will be purely online. The YouTube streaming URL will be provided here shortly before the start of the meetup. Databricks Solutions Architect Stephan Wernz will talk about two relatively new tools open sourced by the creators of Apache Spark at Databricks: Delta Lake and MLflow. Here are the detailed abstracts: #### Building reliable Data Lakes with Delta Lake -------------------------------------------------------------------- The widespread adoption of Apache Spark™, the first unified analytics engine, has helped data professionals make great strides in data science and machine learning. Yet, their upstream data lakes still face reliability challenges when it comes to building production data pipelines at scale to power these initiatives. Delta Lake is an open source storage layer that brings reliability to data lakes. It has numerous reliability features including ACID transactions, scalable metadata handling, and unified streaming and batch data processing. Delta Lake runs on top of existing data lakes, such as on Azure Data Lake Storage, AWS S3, Hadoop HDFS, or on-premise, and is fully compatible with Apache Spark APIs. In this talk we will describe the basic principles of the Delta Lake open source project and demonstrate how to build highly scalable and reliable data pipelines using Delta Lake. #### Model management with MLFlow and Multi-Framework Pipelines ------------------------------------------------------------------------------------------------ ML development brings many new challenges beyond the traditional software development lifecycle: for example, ML developers try a lot of algorithms, tools, and parameters to get the best results, and they need to track all this information to reproduce them. The MLflow project simplifies the whole ML lifecycle by introducing simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this talk, we will show how MLflow can be used to track model parameters and metrics from experiments, package the model to reproduce the runs and finally put the model in a general format for deployment by building a custom model combining multiple frameworks. About the speaker: Stephan is a Solutions Architect at Databricks in Munich helping customers to derive value out of their data in the DACH region. After completing his Msc. in Statistics at HU-Berlin he was employed as data-science consultant, where he applied machine-learning to a broad range of use-cases within marketing and sales among various industries. Before joining Databricks he also worked in big-data analytics in retail for the Schwarz-Group. Schedule -------------- 18:30 - Hang out virtually in the YouTube chat, bring your own snacks and beverages :) 19:00 - "Building reliable Data Lakes with Delta Lake" (Stephan Wernz) 19:45 - "Model management with MLFlow and Multi-Framework Pipelines" (Stephan Wernz) 20:30 - Q&A with the speaker, announcements, more virtual hanging out if desired. Looking forward to see you online! ------------------------------------------------------- This event is sponsored by NOVOMATIC NOVOMATIC is the leading provider of gaming technology and casino equipment in Europe. As such we are constantly applying cutting edge technologies to develop the most innovative solutions in the industry. Sponsoring the Vienna Data Science Tools Meetup is part of our effort to contribute to the creation of a strong Data Science and Machine Learning community in and around Vienna.

Vergangene Events (14)

InfluxDB and the Cloudera Data Science Workbench

Greentube Internet Entertainment Solutions GmbH

Fotos (71)