PyData London - 98th Meetup

Name: PyData London - 98th Meetup
Start: 2025-08-05T19:00:00+01:00
End: 2025-08-05T21:00:00+01:00
Location: EC4R 3AD

Hosted By

Alexandra R. and Prashant T.

Details

Venue: Riverbank House, 2 Swan Ln, London EC4R 3AD

Please note:

🚨🚨🚨A valid photo ID is required by building security. 🚨🚨🚨
This event follows the NumFOCUS Code of Conduct, please familiarise yourself with it before the event.

If your RSVP status says "You're going" you will be able to get in. No need to show your RSVP confirmation when signing in.
If you can no longer make it, please unRSVP as soon as you know.
***
Code of Conduct:
This event follows the NumFOCUS Code of Conduct. Please get in touch with the organisers with any questions or concerns regarding the Code of Conduct.
***
As always, there'll be free food & drinks, generously provided by our host, Man Group.
***

Main Talks

1. How to prepare your AI Agents for the Ice(berg) Age - Serhii Sokolenko

In a future world where AI agents interact with billions of users, many of these agents will also have to interact with data querying tools to provide answers grounded in facts. As enterprise data analytics is rapidly moving towards open table formats like Apache Iceberg, these agents need to be able to speak to Iceberg-based data. In this talk, we will discuss how Apache Iceberg tooling and portable application runtimes make agents grounded in facts and enable them to run across different GPU stacks and deployment models.

2. Fuzzy, Not Fussy: Using AI to Tackle Data Entity Resolution at Scale - Yash Sakhuja @sakhuja_yash

Messy customer data — typos, inconsistent formats, and duplicates — can make it surprisingly difficult to answer basic questions, such as “How many unique customers do we have?” In this talk, I’ll share how I built an AI-powered fuzzy-matching system using Python and vector embeddings to accurately group similar customer records. Drawing on a real-world e-commerce use case, I’ll walk through a scalable, end-to-end solution that automates deduplication and delivers clean, reliable customer insights.

⚡ Lightning Talks

1. Continuous Prompt Evaluation and Optimisation in Production - Anand Rawat

Prompt engineering is no longer a one-time creative task; it has become an iterative, data-driven process. In production environments, prompts must be evaluated, optimised, and versioned just like code or machine-learning models. In this talk, I will demonstrate how to build a CI/CD pipeline for prompt development using tools such as TruLens, Weights & Biases, and custom evaluation metrics.

2. TBD

Logistics
Doors open at 6.30 pm (get there early as you have to sign-in via building security), talks start at 7 pm, drinks from 9 pm in the bar. We will have reduced capacity for this event but there will be plenty of people to discuss data science questions with!
Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members!
Follow @pydatalondon (https://twitter.com/pydatalondon) for updates and early announcements.

Events in London, GB Big Data Python

Business Intelligence Data Management Open Source