Skip to content

Details

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth
Best Selling O'Reilly book, "AI Systems Performance Engineering" is now available (eBook and physical!), 1000 pages, 200 figures, 700 examples!!!

Amazon: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

GitHub: https://github.com/cfregly/ai-performance-engineering

Talk #1: Diving deep into NVIDIA Nsight Systems GPU profiling tools for PyTorch LLM and computer vision workloads by Chaim Rand
In this talk, Chaim Rand (repeat speaker on this webinar series!) revisits the NVIDIA Nsight profiling tools to augment the PyTorch Profiler for LLM and vision workloads. This talk is based on Chaim's recent blog posts on Optimizing Data Transfer in AI/ML Workloads part 1 and part 2.

Talk #2: KV Cache Efficiency + Context "Platform" Engineering by Valentin Bercovici and Callan Fox (WekaIO)
This presentation will include demos and code with a focus on improving KV-cache hit rates as well as introducing a methodology called Context "Platform" Engineering to design and optimize AI infrastructure for Agent Swarm Context at scale. Context Platform Engineering was recantly featured in the CES2026 keynote by Jensen Huang, CEO of NVIDIA. This presentation is related to a recent AIE CODE Summit talk in December 2025.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

AI summary

By Meetup

Online meetup for ML engineers exploring GPU, CUDA, and PyTorch performance optimizations; attendees will learn techniques to accelerate PyTorch workloads.

Members are also interested in