Meetup #4 - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow

This is a past event

121 people went

Rubikloud Technologies Inc.

207 Queens Quay W #801 · Toronto, ON

How to find us

Coming from Queens Quay West street, the correct elevators are further into the building (SOUTH), towards the water - just past the Tim Hortons, Rubikloud is up on the 8th floor. You can also call me (Marshall) at (647)290-0420

Location image of event venue

Details

Daniel Arrizza: www.linkedin.com/in/danielarrizza

Daniel is a Customer Success Engineer at Databricks.

For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but crucial) work of performing ETL, building data pipelines, and putting models into production.

In this session, we’ll walk through the process of building a production data science pipeline step-by-step. Using open-source tools we will:
- Walkthrough querying a data lake with Apache Spark™ and Delta Lake
- Transforming the data with Koalas (distributed PySpark using the pandas API)
- Perform machine learning experiments with hyperparameter tuning (Hyperopt), and
- Log our experiment results to MLflow.

____________

Schedule:
6:00pm - Check-in, Socialize & Eat Pizza
6:30pm - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow
7:30pm - Q&A
7:55pm - Meetup Conclusion
____________