Open-source tools to version control Machine Learning models and experiments


Details
AI and ML are becoming an essential part of the engineering and data science everyday workflow. ML teams need new tools for data versioning, ML pipeline versioning, experiments metrics visualization and others.
Do you have the tools to successfully version data and ML pipelines, visualize experiments, and more? Come and join us for a discussion on the best ML practices!
Agenda
6:30 - 7:00pm – Networking and Refreshments
7:00 - 7:40pm – Talk on ML/DL version control by Dmitry Petrov, creator of DVC.org - Git for data.
7:40 - 8:00pm – Talk on Meta-Graphs for Complex Processing Pipelines by Dan Fischetti, Head of Research at Standard Cognition.
8:00pm – Q&A & open discussion
Talk 1: ML/DL Version Control: version control your models, data, code, and more.
In this talk, Dmitry Petrov will discuss:
- The current practices of organizing ML projects using open-source tools
like Git, MLflow, and DVC.org. - How to version datasets with dozens of gigabytes of data and version
ML models. - How to use your favorite cloud storage (S3, GCS, or bare metal SSH
server) as a data file backend and how to embrace the best engineering
practices in your ML projects.
Bio: Dmitry is a creator of open-source tool Data Version Control - DVC.org. He is a former data scientist at Microsoft with Ph.D. in Computer Science. Now Dmitry is working on tools for machine learning and data versioning as a Co-Founder and CEO of Iterative.AI in San Francisco.
Talk 2: Meta-Graphs for Complex Processing Pipelines
In this talk, Dan Fischetti, who is working on scene comprehension problem in retail, will show:
- A use case with a complex cascade of ML models in a real-time
processing pipeline. - A "meta" graph system to annotate python code to capture the data
types that a function consumes and produces; allowing for scripting of
arbitrarily complex operations to arrive at a consistent set of data.
Bio: Dan Fischetti is the Head of Research at Standard Cognition, an autonomous checkout company working on tackling the scene comprehension problem in retail environments with only overhead RGB cameras. Prior to co-founding Standard, Dan was working on data analysis on the HAL (High-frequency Analytics Lab) at the SEC, munging large columnar datasets with many derived features that required strict audit trails.

Open-source tools to version control Machine Learning models and experiments