Machine Learning Frameworks, Model Management & Operations | DAIS21


Details
**This meetup is hosted on the Data + AI Online Meetup group (https://www.meetup.com/data-ai-online/events/278187825/). Please rsvp there, or use the link below to register directly to Data + AI Summit 2021: https://databricks.com/dataaisummit/north-america-2021
Close out Data + AI Summit 2021 with a live tech talk meetup session with subject matter experts in Machine Learning Frameworks, Model Management and Operations. They will share their insights into ML frameworks, ML platforms, model lifecycle management and operations.
The hosts Jules Damji and Srijith Rajamohan will recap the events of the day and share their favorite session picks from the conference.
REGISTER NOW (for FREE) on the Data + AI Summit site: https://databricks.com/dataaisummit/north-america-2021
Talk One
Title: Building a Unified Machine Learning Monitoring Solution in Databricks by Max Fisher
Abstract: Today, many customers leverage a variety of tools for monitoring models in production which leads to a confusing array of dashboards and reports. This talk will focus on how your team can leverage the Databricks workspace to unify the monitoring of your models and data for drift, and even facilitating the retraining part of the ML Lifecycle. The demo will cover how each part of the Databricks ecosystem is crucial towards building a solution that truly unifies model management in Databricks.
Talk Two
Title: FlexFlow: Automatically Discovering Fast and Scalable Parallelization Strategies for ML Training
Abstract: Existing deep learning frameworks commonly parallelize model training using manually designed strategies (e.g., combinations of data and model parallelism), but these strategies often result in suboptimal parallelization performance due to the increasing complexity of today's DNN models and parallel machine architectures.
FlexFlow (https://flexflow.ai/) is a distributed deep learning engine that supports training DNN models written in PyTorch, TensorFlow Keras, and ONNX. It identifies parallelization dimensions not considered in existing frameworks and automatically discovers fast and scalable parallelization strategies for a specific parallel machine. Companies and national labs are using FlexFlow to train production ML models that do not scale well in current frameworks, achieving over 10x performance improvement.
Talk Three
Tech talk on Hugging Face - details will be added asap!
####
#### Speakers ####
####
** Max Fisher, Solutions Architect at Databricks
Max Fisher is a Solutions Architect at Databricks based out of Chicago. Before joining Databricks, Max worked at Microsoft for three years helping customers build enterprise data platforms on Azure with a primary focus on Azure Databricks and the broader Azure Data + AI stack. When Max is not busy helping customers build on Databricks, he can be found running up and down Chicago's lakefront, reading, or obsessing over college basketball (Go Illini!).
** Zhihao Jia, Research Scientist, Facebook
Zhihao Jia is a research scientist at Facebook and will join CMU as an assistant professor of computer science in Fall 2021. He obtained his Ph.D. from Stanford working with Alex Aiken and Matei Zaharia. His research interests lie in the intersection of computer systems and machine learning, with a focus on building efficient, scalable, and high-performance systems for ML computations.

Machine Learning Frameworks, Model Management & Operations | DAIS21