Manage the Machine Learning Lifecycle with MLFlow / Effective Data Visualization

Details
After a long hiatus, the SF PyData meetup group is BACK! Starting Sept 6th, we're going to return to having regular, in-person meet ups in San Francisco.
The usage of Python and PyData tools has grown explosively over the last few years, and we're very excited to start building a community where data lovers from all across the Bay Area can meet, connect, and learn from each other.
To kick off the new series we've got a great line up of speakers who'll teach us about 1) managing the complete machine learning lifecycle and 2) producing clear, effective visualizations of scientific data.
## Agenda:
6:00 - 6:45pm: Mingling
6:45 - 6:50pm: Opening remarks
6:50 - 7:35pm: Tech-Talk-1 MLflow: Infrastructure for a Complete Machine Learning Life Cycle
7:35 - 8:05pm: Tech-Talk-2: Data Visualization for Scientific Discovery
8:05 - 8:30pm: Mingling
8:30: Event over!
Many thanks to Cloudflare for volunteering to host.
Doors will close at 7:15 so please arrive before then.
## MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Abstract:
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but these platforms are limited to each company’s internal infrastructure.
In this talk, we will present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Speaker Bios:
Mani Parkhe is an ML/AI Platform Engineer at Databricks, working on customer facing and open source platform initiatives to enable data discovery, training, experimentation, and deployment of ML models on the cloud. He has also worked on various data intensive batch and stream processing problems at LinkedIn and Uber.
Andrew Chen is a software engineer at Databricks and a MLflow committer. Andrew is working on tools to simplify the end to end experience of machine learning, all the way from data ETL to model training and deployment. Before working at Databricks, Andrew received his BS in EECS from UC Berkeley in 2016.
## Data Visualization for Scientific Discovery
Abstract:
Choosing the visual form for a visualization is a decision about what aspects of the data matter most. Highlight or ignore outliers? Look at values, differences, or changes? In data analysis we risk missing discoveries by failing to notice important features of our data, yet we often use default parameters and charts without realizing what we might miss. I will demonstrate how to translate questions about your data into chart parameters, taking into account your context, goals, and constraints. Using Python examples, I'll illustrate powerful techniques like using color intentionally, creating 'small multiples' of charts that vary visual form or data, and optimizing for your time, energy, and attention.
Speaker Bio:
Zan Armstrong is a data visualization engineer and designer. With a background in data analysis, she is especially fascinated by identifying what characteristics of the data might be most important and then creating ways to reveal those characteristics visually. She has also won an Information is Beautiful award for work published in Scientific American and a tool she worked on was part of SF Moma's Designed in California exhibit. Zan's primary tools includes Javascript, R, and Python.

Manage the Machine Learning Lifecycle with MLFlow / Effective Data Visualization