Nvidia Nsight GPU Profiling +KV Cache Efficiency +Context "Platform" Engineering
Details
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth
Best Selling O'Reilly book, "AI Systems Performance Engineering" is now available (eBook and physical!), 1000 pages, 200 figures, 700 examples!!!
Amazon: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
GitHub: https://github.com/cfregly/ai-performance-engineering
Talk #1: Diving deep into NVIDIA Nsight Systems GPU profiling tools for PyTorch LLM and computer vision workloads by Chaim Rand
In this talk, Chaim Rand (repeat speaker on this webinar series!) revisits the NVIDIA Nsight profiling tools to augment the PyTorch Profiler for LLM and vision workloads. This talk is based on Chaim's recent blog posts on Optimizing Data Transfer in AI/ML Workloads part 1 and part 2.
Talk #2: KV Cache Efficiency + Context "Platform" Engineering by Valentin Bercovici and Callan Fox (WekaIO)
This presentation will include demos and code with a focus on improving KV-cache hit rates as well as introducing a methodology called Context "Platform" Engineering to design and optimize AI infrastructure for Agent Swarm Context at scale. Context Platform Engineering was recantly featured in the CES2026 keynote by Jensen Huang, CEO of NVIDIA. This presentation is related to a recent AIE CODE Summit talk in December 2025.
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
AI summary
By Meetup
Online meetup for ML engineers exploring GPU, CUDA, and PyTorch performance optimizations; attendees will learn techniques to accelerate PyTorch workloads.
AI summary
By Meetup
Online meetup for ML engineers exploring GPU, CUDA, and PyTorch performance optimizations; attendees will learn techniques to accelerate PyTorch workloads.
