Saltar al contenido

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Foto de Chris Fregly
Hosted By
Chris F.
NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Detalles

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA
NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others
CUDA Optimizations for high-performance AI.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

Photo of AI Performance Engineering Meetup (Madrid) group
AI Performance Engineering Meetup (Madrid)
Ver más eventos

Every 3rd Monday of the month until November 25, 2025

Evento en línea
Solo los asistentes pueden ver el enlace
GRATIS