Mark your calendar for the next session of the PyData Paris Meetup on March 26th 2019. This Meetup will be hosted by the Conservatoire National des Arts et Metiers (Cnam), 292 rue Saint-Martin, 75003 Paris.
The speakers for this session are Olivier Grisel, Sarah Diot-Girard, and Stephanie Bracaloni.
7:00pm - 7:15pm: Community announcements
7:15pm - 8:00pm: Olivier Grisel
Scikit-learn: what's new and what's under development
8:00pm - 8:45pm: Sarah Diot-Girard, and Stephanie Bracaloni
From ML experiments to production: versioning and reproducibility with MLV-tools
8:45pm - 9:30pm: Standing buffet
* Olivier Grisel:
* Scikit-learn: what's new and what's under development
Scikit-learn is one of the most popular machine learning libraries. This talk will present a selection of recently released features and introduce some new developments including much more scalable models such as fast Histogram-based Gradient Boosting Decision Trees, an efficient reimplementation of k-means and much more.
* Sarah Diot-Girard, and Stephanie Bracaloni:
* From ML experiments to production: versioning and reproducibility with MLV-tools
You're a data scientist. You have a bunch of analyses you performed in Jupyter Notebooks, but anything older than 2 months is totally useless because it's never working right when you open the notebook again. Also, you cannot remember the dropout rate on the second to last layer of this convolutional neural network which gave really great results 2 weeks ago and that you now want to deploy into production. Does that ring a bell?
You're a software engineer in a data science team. You can’t imagine life without Git. Reviews on readable files, tests, code analysis, CI, used to belong to your daily basis. You were thinking of Jupyter Notebooks only as a demo tool. You need reproducibility for every step of your work even if you lose a server. And last but not least, you want to be able to deliver to production something usable by anyone. Is there a magical solution?
No! But we can find compromise to satisfy those two worlds...
We had these kind of issues in PeopleDoc. Building on open-source solutions, we have developed a set of open-source tools and designed a process that works for us. We are thrilled to present our project and we hope to spark a discussion with the community.
See you on Github: https://github.com/peopledoc/ml-versioning-tools
Olivier Grisel is a core developer of scikit-learn working at Inria and supported by the scikit-learn initiative at Fondation Inria https://scikit-learn.fondation-inria.fr/
Sarah Diot-Girard is working as a Machine Learning engineer since 2012 and she enjoys finding solutions to engineering problems using Data Science. She is particularly interested in practical issues, both ethical and technical, coming from applying ML into real life. In the past, she gave talks about data privacy and algorithmic fairness, but she also promotes a DataOps culture.
Stephanie Bracaloni has been working as a software engineer for more than 6 years. She is now working on the industrialization of machine learning projects (from POC to production). She likes development but she is not “just a coder” she always keeps in mind systems and projects as a whole. Finding solutions to new problems or improve day to day process is something she really enjoys.