Daniel Arrizza: www.linkedin.com/in/danielarrizza
Daniel is a Customer Success Engineer at Databricks.
For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but crucial) work of performing ETL, building data pipelines, and putting models into production.
In this session, we’ll walk through the process of building a production data science pipeline step-by-step. Using open-source tools we will:
- Walkthrough querying a data lake with Apache Spark™ and Delta Lake
- Transforming the data with Koalas (distributed PySpark using the pandas API)
- Perform machine learning experiments with hyperparameter tuning (Hyperopt), and
- Log our experiment results to MLflow.
6:00pm - Check-in, Socialize & Eat Pizza
6:30pm - Productionizing Machine Learning with Delta Lake, Koalas, and MLflow
7:30pm - Q&A
7:55pm - Meetup Conclusion