Professional ML Platforms require solid infrastructure. Setting up such infrastructure is a difficult task and an interesting use case for Cloud Native technologies. The talks will give an update on what to use today.
18:30: Doors open. Have a snack, grab a drink
19:00: Talks (abstracts below)
- The case for a common Metadata Layer for Machine Learning Platforms
- Building ML Pipelines with DCOS
- ML pipelines with Big Data
21:00: Have more drinks and snacks, and get in touch with the speakers (and other attendees)
*** The case for a common Metadata Layer for Machine Learning Platforms (Jörg Schad, ArangoDB) ***
With the rapid and recent rise of data science, the Machine Learning Platforms being built are becoming more complex. For example, consider the various Kubeflow components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature store, and more. Each of these components is producing metadata: Different (versions) Datasets, different versions a of a jupyter notebooks, different training parameters, test/training accuracy, different features, model serving statistics, and many more.
For production use it is critical to have a common view across all these metadata as we have to ask questions such as: Which jupyter notebook has been used to build Model xyz currently running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated?
In this talk, we look at existing implementations, in particular MLMD as part of the TensorFlow ecosystem. Further, propose a first draft of a (MLMD compatible) universal Metadata API. We demo the first implementation of this API using ArangoDB.
Jörg is Head of Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics.
*** Building ML Pipelines with DCOS (Emil A. Siemes, Mesosphere) ***
Tired of managing infrastructure instead of creating exiting ml models? Learn what DC/OS can do for the data scientist.
Emil is interested in building, running, and managing the next generation of distributed and data-driven web and mobile applications. After several years as Java Architect with Sun Microsystems, Aplix, Wily, SpringSource (VMware) and Hortonworks Emil joined Mesosphere, where he helps customers modernize their applications with container, fast- and big-data as well as ML & AI technologies.
*** ML pipelines with Big Data, (Steffen Grohsschiedt, Logical Clocks) ***
Machine Learning (ML) pipelines are the fundamental building block for productionizing ML code. Building such pipelines with Big Data is a complex process. The different stages in ML pipelines also need to be orchestrated, from data ingestion and data transformation, to feature engineering, to model training, serving and monitoring.
Hopsworks is an open-source data platform that can be used to both develop and operate horizontally scalable machine learning (ML) pipelines. A key part of our pipelines is the world's first open-source Feature Store, that acts as a data warehouse for features, providing a natural API between data engineers - who write feature engineering code - and Data Scientists, who select features from the feature store to generate training/test data for models.
Steffen is the Head of Cloud at Logical Clocks driving the cloud development of the Hopsworks platform. Before joining Logical Clocks, he worked as a senior data engineer at Spotify building a state-of-the-art reporting platform after having operated and finally migrated one of Europe's largest Hadoop installations to Google Cloud.
Hinweis: Während des Meetups werden Fotos gemacht. Falls du nicht auf den Fotos erscheinen möchtest, melde dich bitte zu Beginn des Meetups beim Veranstaltungsteam.