Deploying Machine Learning Models to Production at Scale


Join us for the next Apache Spark London Meetup! We have a machine learning focussed session this time. As usual there will be some food and refreshments and an opportunity to network as well as some great talks! So join us for an evening of Apache Spark+AI!

Title: Understanding Depop's Inventory: Our Journey into Image Recognition

Speaker: Clemence Burnichon (Depop)

At Depop, our 15 millions + users can list items for sale with up to 4 images and a short description. In order to understand our inventory, we have developed the ability to extract information from images and free text by developing machine learning model and deploying there model in production. In this talk, I will walk you through the journey we took to develop such capabilities from solution design to deployment in production.

Clemence is leading the machine learning effort across Depop, specialising in generating smart capabilities to improve our buyer's and seller's experience with Depop. During her 7 years as a data scientist, she has acquired experience in multiple ML fields such as computer vision, recommendation engine, search and NLP. Prior to Depop, she worked Net-a-porter and Sainsbury’s.

Title: Managing the Machine Learning Lifecycle at scale with MLflow

Speaker: Matt Thomson (Databricks)

Typically when we talk about distributed Machine Learning we talk about how we can build models on bigger and bigger datasets, using the extra information contained in that data to build better and better models, and Apache Spark with SparkML makes this process very simple. However, we are more and more seeing a requirement to train a large number of relatively small models. In this talk we will discuss how you can use the PySpark PandasUDF method to parallelise the training of any number of models to solve this problem. What's more when building models at this scale, model management becomes a real challenge, we will demonstrate how ML Flow can help solve this problem. Finally, we will also discuss how this method can be utilised to parallelise hyperparameter training for models as well.

Matt Thomson has been at Databricks in London since April 2018 and leads the Machine Learning practice for Resident Solutions Architects in EMEA. His team works with customers to help them build and deliver their Machine Learning and Big Data/Spark applications, right from inception and architecture design through to implementation and production. Previously Matt was Head of Data Science for Credit Card Strategic Analytics within a global bank, developing machine learning models to optimise customer targeting and as a consultant he helped develop the ML capability for a UK Government Dept. But Matt cut his teeth in data science during his PhD in Astrophysics studying the evolution of distant galaxies.