Name: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations
Start: 2025-09-15T12:00:00-04:00
End: 2025-09-15T13:00:00-04:00

**Zoom link**: [https://us02web.zoom.us/j/82308186562](https://us02web.zoom.us/j/82308186562)

**Talk #0: Introductions and Meetup Updates**
by Chris Fregly and Antje Barth

**Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA**
NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

**Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others**
CUDA Optimizations for high-performance AI.

**Zoom link**: [https://us02web.zoom.us/j/82308186562](https://us02web.zoom.us/j/82308186562)

**Related Links**
Github Repo: [http://github.com/cfregly/ai-performance-engineering/](http://github.com/cfregly/ai-performance-engineering/)
O'Reilly Book: [https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/](https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/)
YouTube: [https://www.youtube.com/@AIPerformanceEngineering](https://www.youtube.com/@AIPerformanceEngineering)
Generative AI Free Course on DeepLearning.ai: [https://bit.ly/gllm](https://bit.ly/gllm)

Chris Fregly

AI Performance Engineering Meetup (Washington DC 2)

Technology

PyTorch

Kubernetes

CUDA: Compute Unified Device Architecture

Machine Learning

Artificial Intelligence

Big Data

Data Science

TensorFlow

Python

Every 3rd Monday of the month until November 17, 2025

Andrew Brodey

akmalmukhamadiev2020

Grace Barker

Raul Chong

Shehzad Bashir

Michael April

Jay Kumar

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Online event

Share this event

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Details