Zum Inhalt springen

Open source tools for machine learning models and dataset versioning

Foto von Erik Petzold
Hosted By
Erik P.
Open source tools for machine learning models and dataset versioning

Details

AI and ML are becoming an essential part of software engineering. The traditional engineering toolset does not fully cover machine learning team’s needs. The teams need new tools for data versioning, ML pipeline versioning, ML model versioning, experiments metrics tracking, and others.

First talk: Open source tools for machine learning models and dataset versioning

In this talk we will discuss:

  • The current practices of organizing ML projects using open-source tools like Git, MlFlow, and DVC.org.
  • Motivation for creating data specific versioning system DVC.
  • How to version datasets with dozens of gigabytes of data and version ML models.
  • How to use your favorite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend.
  • How to embrace the best engineering practices in your ML projects.

Bio:
Dmitry is a creator of open-source tool Data Version Control - DVC.org - or Git for data. He is a former data scientist at Microsoft with Ph.D. in Computer Science. Now Dmitry is working on tools for machine learning and data versioning as a Co-Founder and CEO of Iterative.AI in San Francisco, CA.

Second talk: dvc Hands-on

Abstract:
In a live demo of DVC we explore its core functionality.

  • We set up a machine learning pipeline, and as we continue to develop it we track its versions -- together with the entire relevant data and metrics.
  • For updated input data, we let DVC take care of reproducing the affected parts of our pipeline.
  • Then we demonstrate how to share the pipeline and its data with team mates.

Bio:
Bert worked as a systems engineer for the FAIR particle accelerator control system, then got his PhD researching on matching heuristics. Today, he is a software engineer at codecentric and has recently developed an interest in machine learning.

Agenda

18:00 doors open,
18:30 Talk 1,
19:15 Talk 2,
20:00 Networking

Photo of codecentric Berlin group
codecentric Berlin
Mehr Events anzeigen
codecentric AG
Köpenicker Str 31 · Berlin