Skip to content

PyData Cambridge - 24th Meetup

Photo of Ole Schulz-Trieglaff
Hosted By
Ole S. and 3 others
PyData Cambridge - 24th Meetup

Details

We are happy to announce the 24th PyData Cambridge meetup!

IMPORTANT

Due to COVID-19 social distancing measures, this edition will be hosted online using the Zoom platform.

Agenda

19:00 - Introduction
19:15 - "Breaking down data silos with collective machine learning" -- Emma Smith and Juan Besa (fetch.ai)
19:45 - "ML Flow and managing the distributed end-to-end machine learning life-cycle." -- Matthew Thomson (Databricks)
20:15 - End

Code of Conduct

PyData is dedicated to providing a harassment-free event experience for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of participants in any form.

The PyData Code of Conduct governs this meetup. ( http://pydata.org/code-of-conduct.html ) To discuss any issues or concerns relating to the code of conduct or the behavior of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (leah@numfocus.org) or organizers.

Talks

Breaking down data silos with collective machine learning

Abstract: The recent success of machine learning (ML) algorithms in the deep learning era has been dependent on the ever increasing size and quality of training data. This has greatly expanded the scope of what ML is capable of achieving but has come at the cost of centralizing most of its potential and financial rewards in the hands of large organisations. In this presentation, we describe a system that uses blockchain technology to enable multiple peers to train ML models without requiring them to share the underlying data with each other. The blockchain serves as a means to record data provenance, audit trails, financial incentives and codified governance mechanisms that enable the different stakeholders to be coordinated for their collective benefit. The talk describes the collective learning protocol, its implementation in a smart contract and its application to the healthcare industry.

Bio: TBA

** ML Flow and managing the distributed end-to-end machine learning life-cycle.**

Abstract: Typically when we talk about distributed Machine Learning we talk about how we can build models on bigger and bigger datasets, using the extra information contained in that data to build better and better models, and Apache Spark with SparkML makes this process very simple. However, here at Databricks we are more and more seeing a requirement to train a large number of relatively small models. In this talk we will discuss how you can use the PySpark PandasUDF method to parallelise the training of any number of models to solve this problem. What's more when building models at this scale, model management becomes a real challenge, we will demonstrate how ML Flow can help solve this problem. Finally, we will also discuss how this method can be utilised to parallelise hyperparameter training for models as well.

Bio
Matt Thomson leads the Machine Learning practice for Resident Solutions Architects in EMEA. His team works with customers to help them build and deliver their Machine Learning and Big Data/Spark applications, right from inception and architecture design through to implementation and production. Previously Matt was Head of Data Science for Credit Card Strategic Analytics within a global bank and as a consultant he helped develop the ML capability for a UK Government Dept. Matt did a PhD in Astrophysics studying the evolution of distant galaxies.

Photo of PyData Cambridge group
PyData Cambridge
See more events