Simplifying the Machine Learning Lifecycle with MLflow and Koalas

Are you going?

8 spots left

Share:
Location image of event venue

Details

Join us for a fun data science event focusing on the open-source data science technologies: MLflow and Koalas. While there will be slides, this will be a demo-heavy meetup!

MLflow: Managing the Machine Learning Lifecycle
In this session, we will discuss the Machine Learning lifecycle and the challenges associated with it. The fundamental problem is that data and ML - whether the people or the technology - are often siloed from each other. In these silos, it becomes next to impossible for data practitioners to standardize their ML lifecycle from the preparation of data, building the model, to deploying the model. With MLflow, you and your teams can breakdown these walls and ensure that build, reproduce, and repeat your ML pipelines.

Koalas: Unifying Spark and pandas API
Pandas is very popular for data manipulation and analysis in Python. It is deeply integrated within the Python data science ecosystem (think sklearn, numpy, matplotlib, etc.). It can easily handle many situations but it cannot easily scale beyond a single node. With Koalas, you get the distributed power of Apache Spark using the familiar (and powerful) pandas API. This allows data scientists to seamlessly transition from small data to large data.

Agenda:
6:00pm-6:15pm: Welcome
6:15pm-7:00pm: MLflow: Managing the Machine Learning Lifecycle
7:00pm-7:45pm: Koalas: Unifying Spark and pandas API
7:45pm-8:00pm: Q&A and Wrap up.

Speakers:
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.