Skip to content

Meetup #4 - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow

Photo of Marshall Berenbaum
Hosted By
Marshall B.
Meetup #4 - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow

Details

Daniel Arrizza: www.linkedin.com/in/danielarrizza

Daniel is a Customer Success Engineer at Databricks.

For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but crucial) work of performing ETL, building data pipelines, and putting models into production.

In this session, we’ll walk through the process of building a production data science pipeline step-by-step. Using open-source tools we will:

  • Walkthrough querying a data lake with Apache Spark™ and Delta Lake
  • Transforming the data with Koalas (distributed PySpark using the pandas API)
  • Perform machine learning experiments with hyperparameter tuning (Hyperopt), and
  • Log our experiment results to MLflow.

____________

Schedule:
6:00pm - Check-in, Socialize & Eat Pizza
6:30pm - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow
7:30pm - Q&A
7:55pm - Meetup Conclusion
____________

Photo of Toronto Apache Spark TAS 2.0 group
Toronto Apache Spark TAS 2.0
See more events
Kinaxis
207 Queens Quay W #801 · Toronto, ON