PyData London - 82nd Meetup


Details
Venue: Riverbank House, 2 Swan Ln, London EC4R 3AD -
IMPORTANT: LOCATION UPDATED!
Please note:
- 🚨🚨🚨A valid photo ID is required by building security. 🚨🚨🚨
- This event follows the NumFOCUS Code of Conduct, please familiarise yourself with it before the event.
If your RSVP status says "You're going" you will be able to get in. No need to show your RSVP confirmation when signing in.
If you can no longer make it, please unRSVP as soon as you know.
***
Code of Conduct:
This event follows the NumFOCUS Code of Conduct. Please get in touch with the organisers with any questions or concerns regarding the Code of Conduct.
***
As always, there'll be free food & drinks, generously provided by our host, Man Group.
***
Main Talks
Toolbox of a not-so Data Scientist - Tambe Tabitha Achere
This talk is about building data science solutions in scenarios where demos cannot be done on a notebook and dashboards do not suffice as a final deliverable. By the end of this session, the audience will have an idea of how data scientists can build the logic behind full-stack applications without the need to learn a backend framework.
I will do a deep dive into one of my projects and there will be lots of code samples accompanied by explanations that led to design decisions. The project I'll be diving into is one in which the data could not be pulled in so if you've ever had to build for data you couldn't see, this session is for you too. I'll highlight the tools, packages and processes that enabled it to be built.
Boosting Similarity Search With Real-time Stream Processing - Fawaz Ghali
The goal of similarity search and vector databases is to find similar results to the search query for unstructured data, such as text, images, and videos. The unstructured data first is vectorized, and stored in a vector format. There are publicly available tools to create vectors from unstructured data; similarly, there are vector databases to store and perform similarity searches. This is important because of the rising popularity of Large Language Models (LLMs) and their combination with vector databases. Here, we present a hybrid approach by taking the strengths of vector databases and boosting them with traditional search and filtering techniques based on real-time stream processing. Vector databases are good for building high-performance vector search applications. On the other hand, stream processing can be used for real-time fast data storage for structured data (filters, tags, and contextual data). In this work, we're adding context and memory to vector databases to ingest, enrich, predict, and act on your data in a simplified but efficient approach. In this talk, we’ll focus on how Real-time compute APIs help leverage the processing capabilities of a distributed cluster, so you aren’t leaving large potential performance gains on the table. The combination of Real-time storage and computing provides a unique synergy that enables applications to address real-time use cases at any scale.
âš¡ Lightning Talks
Open-Source Science (OSSci) - Tim Bonnemann
Open-Source Science (OSSci) is a new NumFOCUS initiative – launched in July 2022 in partnership with IBM – that aims to accelerate scientific research by improving the ways open source software in science gets done (built, used, funded, sustained, recognized, etc.). OSSci connects scientists, OSS developers and other stakeholders to share best practices, identify common pain points, and explore solutions together. The five OSSci interest groups to date cover domain-specific topics (chemistry/materials, life sciences/healthcare, climate/sustainability) as well as cross-domain topics (reproducibility, map of science), with more to be rolled out in 2024. This lightning talk will provide a brief overview of OSSci’s activities to date, our plans for 2024, and how you can get involved.
(Maybe) faster Pandas with CuDF on the GPU (perhaps) - Ian Ozsvald
NVIDIA's CuDF promises 100-1000x GPU speed ups with 100% compatibility, with a bit of effort it can be made to work. This talk shows what could work and which bits (including setup!) can be painful
Logistics
Doors open at 6.30 pm (get there early as you have to sign-in via building security), talks start at 7 pm, drinks from 9 pm in the bar. We will have reduced capacity for this event but there will be plenty of people to discuss data science questions with!
Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members!
Follow @pydatalondon (https://twitter.com/pydatalondon) for updates and early announcements.
COVID-19 safety measures

Sponsors
PyData London - 82nd Meetup