Skip to content

PyData Cambridge - 39th Meetup

Photo of Federico
Hosted By
Federico and 4 others
PyData Cambridge - 39th Meetup

Details

Join us for our first meetup in 2023!

Agenda

18:45 - Doors open (Please do not arrive earlier)
19:00 - Introduction
19:15 - What's wrong with my code? Questions to ask about data science code... and a few possible answers (by Lucas Bordeaux, Cyted)
19:50 - Interval
20:15 - Better data engineering with Dagster: Building maintainable, reliable and extensible pipelines (by Jaymin Mistry, Faculty)
20:50 - End (Pub - Old Ticket Office, Station Square)

What's wrong with my code? Questions to ask about data science code... and a few possible answers

Machine Learning and data science projects can be complex: it's not uncommon to see teams that produce great science, but whose progress and delivery are slowed down by software engineering issues.
This talk is a summary of some observations Lucas made from various data science projects over the years. It will focus on questions that he finds himself asking when looking at code, and that you might find useful to use in your work as well.

Bio: Lucas Bordeaux has developed software for AI projects of various kinds since his PhD that started (just) in the past century.
He's currently doing some Machine Learning at Cyted, a Cambridge startup that provides early diagnosis for oesophagus cancer.
Cyted processes some fun data including lots of histopathology images (this talk won't be about this, though).
He's written code with sophisticated languages such as Ada, OCAML, F#, C#, C++, TypeScript... and despite being attracted to recent languages like Julia, Rust or Swift, he now seems to be a full-time Python developer - which wasn't his idea of what the 2020s would look like.
He understands that Pythonistas Anonymous will allow him to meet other people with the same problem, and is looking forward to group therapy.

Better data engineering with Dagster: Building maintainable, reliable and extensible pipelines

During this talk, Jaymin will cover some of the common challenges faced by data scientists dealing with data pipelines in the wild, his previous experiences in a variety of organisations and provide an introduction to using Dagster. Data scientists and data teams continue to face challenges when maintaining, troubleshooting and extending data pipelines, moving from experimentation to production and handling data dependencies. Dagster is an open source orchestrator developed by Elementl and extends the functionality provided by earlier orchestration tools with functionality that helps data teams better develop and manage their data pipelines. Building on previous DAGs, it helps maintain, run and extend data pipelines whilst working in a larger data team.

Bio: Jaymin Mistry is a senior data scientist at Faculty, a London based data science consultancy. He originally studed Biology and learnt R before seeing the light and switching to Python. He has worked as a data scientist at a technical consultancy, a software company and within the public sector where he helped run a data science team in the UKHSA (UK health security agency) during the COVID pandemic. He enjoys helping organisations make better decisions using data.

Code of Conduct
PyData is dedicated to providing a harassment-free event experience for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of participants in any form.
The PyData Code of Conduct governs this meetup. ( http://pydata.org/code-of-conduct.html ) To discuss any issues or concerns relating to the code of conduct or the behaviour of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (leah@numfocus.org) or organizers.

Photo of PyData Cambridge group
PyData Cambridge
See more events
Raspberry Pi Foundation
37 Hills Road · Cambridge