About us
Submit a talk: https://london.pydata.org/submit-a-talk/
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
The PyData Code of Conduct governs this meetup. To discuss any issues or concerns relating to the code of conduct or the behavior of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (+1 512-222-5449; [leah@numfocus.org](mailto:leah@numfocus.org)) or the group organizer.
Upcoming events
1

PyData London - 104rd Meetup
EC4R 3AD, London, GBVenue: Riverbank House, 2 Swan Ln, London EC4R 3AD
Please note:
1. 🚨🚨🚨 A valid photo ID is required by building security. 🚨🚨🚨
2. This event follows the NumFOCUS Code of Conduct. Please familiarise yourself with it before attending.If your RSVP status says "You're going" you will be able to get in. No need to show your RSVP confirmation when signing in.
If you can no longer make it, please unRSVP as soon as possible.Code of Conduct:
This event follows the NumFOCUS Code of Conduct. Please get in touch with the organisers with any questions or concerns.As always, there will be free food and drinks, generously provided by our host, Man Group.
Main Talks
- SVD-ROM: Reduced Order Modeling of huge arrays using the Singular Value Decomposition - David Salvador-Jasin
We present SVD-ROM, an open source Python package to perform Reduced Order Modeling (ROM) of very large arrays using the Singular Value Decomposition (SVD). Despite the high dimensionality of datasets typically encountered in real-world scenarios such as fluid dynamics or weather & climate, these are typically low-rank: there are a few dominant patterns that explain the high dimensionality. The SVD is a useful dimensionality reduction technique for such systems, and has the advantages of being explainable and computationally efficient. SVD-ROM enables the application of SVD-based machine learning methods such as Principal Component Analysis (PCA), Proper Orthogonal Decomposition (POD) or Dynamic Mode Decomposition (DMD) on massive datasets. We employ Dask for parallel and out-of-core computation, Xarray for labelled N-dimensional arrays, and data formats such as Zarr and NetCDF for chunked storage of large arrays. We have developed SVD-ROM keeping user-friendliness in mind, exposing a Python API that enables the straightforward application of methods such as pre-processing, model fitting, reconstruction and forecasting. Additionally, Dask makes code portability across laptop, HPC clusters or cloud straightforward, enabling the user to run SVD-ROM on different computational environments with minimal changes. The development of SVD-ROM has also resulted in multiple contributions to PyDMD, the most popular open source Python package for the application of DMD. In this talk we show how we can fit a DMD model to a global weather dataset tens of GB in size and produce an accurate sub-seasonal forecast (6 weeks ahead) on a laptop in just a few minutes. - From Zero to HyperPod: Cutting Infrastructure Complexity for Distributed Model Training on AWS - Anton Nazaruk
Training state‑of‑the‑art models often stalls on two pain points: GPU availability and infrastructure complexity. In this 40‑minute, practitioner‑level session, I will demystify the fastest, most cost‑effective routes to dedicated NVIDIA H100 and B200 clusters on AWS. We will compare EC2 On‑Demand, Spot, Savings Plans, EC2 Capacity Blocks, and the new SageMaker Training Plans with HyperPod, showing where each shines for research, production training, and massive fine‑tunes. A live demo will provision a 256‑GPU HyperPod, walk through distributed‑training set‑up with Deep Speed + PyTorch, and showcase built‑in checkpointing and automatic node recovery. Attendees will leave with: 1) a decision framework for picking the right reservation model, 2) reference Terraform and CDK snippets to launch clusters in minutes, and 3) cost‑optimisation tips that have saved our clients up to 35% versus unmanaged fleets. Biggest focus on practical architectures, benchmarks, and war stories you can apply the next day.
Lightning Talks
- Agentic, Agentic AI for Personalized Care — Lessons and Challenges from Holisticare.io - Mojtaba Kargar
- Mapping the International PyData Community featuring Web Scraping and Data Wrangling - Hugh Evans
--------------------------------
Logistics
Doors open at 6.30 pm (get there early as you'll need to sign in with building security).
Talks start at 7:00 pm, with drinks afterwards from 9:00 pm at The Banker (EC4).
We have reduced capacity for this event, but there will be plenty of people to discuss data science questions with.
Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members.
If you want me to trim lightning talks down to two or shorten any abstracts, say which ones.194 attendees- SVD-ROM: Reduced Order Modeling of huge arrays using the Singular Value Decomposition - David Salvador-Jasin
Past events
122




