Skip to content

Details

Join Oz Katz, creator of lakeFS, the open source git-like repository for data lakes, as he dives into how to build reproducible ML processes with a common open source stack.
---
Machine learning experiments consist of Data + Code + Environment. While MLFlow Projects are a great way to ensure reproducibility of Data Science code, it cannot ensure the reproducibility of the input data used by that code.

In this talk, we'll go over the trifecta required for truly reproducible experiments: Code (MLFlow and Git), Data (lakeFS) and Environment (Infrastructure-as-code).

This talk will include a hands-on code demonstration of reproducing an experiment, while ensuring we use the exact same input data, code and processing environment as used by a previous run. We will demonstrate programmatic ways to tie all moving parts together: from creating commits tha snapshot the input data, to tagging and traversing the history of both code and data in tandem.

Grab your spot here: https://info.lakefs.io/reproducible_ml_20230907

Related topics

Artificial Intelligence
Automated Machine Learning
Machine Learning
Big Data
Data Visualization

You may also like