PyData Montreal Meetup #13 (ONLINE event)

PyData Montreal
PyData Montreal
Public group

Online Event

This event has passed


This is an online event with the external registration form

PyData Montreal meetup #13:
- "Managing the Complete Machine Learning Lifecycle with MLflow" by Jules Damji
- "Sampling for big data and application to networks and recommendations" by Antoine Rebecq

All times in EDT
6:30 pm — Zoom meeting starts; Introductions
6:40 pm — Managing the Complete Machine Learning Lifecycle with MLflow

Machine Learning development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.
In this session, we introduce MLflow, a new open-source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

About Jules:
Jules S. Damji is a Developer Advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/Loudcloud, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science (from Oregon State University and Cal State, Chico respectively) and an MA in Political Advocacy and Communication (from Johns Hopkins University).

7:20 pm — Break
7:30 pm — Sampling for big data and application to networks and recommendations

Statistical sampling is an often underestimated but very valuable tool for big data. We'll introduce the basics of how some algorithms can efficiently create a small sample of big datasets with good statistical properties. They can then be used to speed up a wide variety of applications, from simple descriptive analytics to machine learning. We'll then focus on an application to graph datasets. Statistical analysis of graph data is more and more popular in tech and can provide powerful insights, for example on social networks or recommendation algorithms. Unfortunately, graph data processing is often very costly and hard to scale. We'll show that sampling can be successfully applied to graphs as well, and present results from a state-of-the-art graph sampling algorithm on a recommender problem.

About Antoine:
Antoine Rebecq holds a PhD in Statistics. After experiences as a data scientist in various settings, he joined the methodology department at Insee (French equivalent of Statistics Canada), where he developed methods to produce and validate robust official figures for the French government. Part of his work consisted of research on interfaces between classical statistics and industrial big data problems. He then moved to beautiful Montreal and its dynamic data and machine learning scene. There he led a division of machine learning engineering at Ubisoft Montreal. He now works for Shopify, leading a team that applies full-stack data science methods to help to develop the best e-commerce platform that offers merchants everything they need to sell online, on social media, or in person.

8:30 pm — Final notes