Name: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations
Start: 2025-09-16T01:00:00+09:00
End: 2025-09-16T02:00:00+09:00

**Zoom link**: [https://us02web.zoom.us/j/82308186562](https://us02web.zoom.us/j/82308186562)

**Talk #0: Introductions and Meetup Updates**
by Chris Fregly and Antje Barth

**Talk #1: NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving by Chris Alexiuk @ NVIDIA**
NVIDIA Dynamo splits LLM serving into disaggregated prefill and decode stages, letting each scale independently for better throughput under latency constraints. We'll dive deep into how Dynamo does disaggregated serving in this session.

**Talk #2: High Performance CUDA Optimizations by Chris Fregly and Others**
CUDA Optimizations for high-performance AI.

**Zoom link**: [https://us02web.zoom.us/j/82308186562](https://us02web.zoom.us/j/82308186562)

**Related Links**
Github Repo: [http://github.com/cfregly/ai-performance-engineering/](http://github.com/cfregly/ai-performance-engineering/)
O'Reilly Book: [https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/](https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/)
YouTube: [https://www.youtube.com/@AIPerformanceEngineering](https://www.youtube.com/@AIPerformanceEngineering)
Generative AI Free Course on DeepLearning.ai: [https://bit.ly/gllm](https://bit.ly/gllm)

Chris Fregly

AI Performance Engineering Meetup (Tokyo)

Technology

Artificial Intelligence

Big Data

Natural Language Processing

Machine Learning

Data Analytics

Predictive Analytics

Neural Networks

Artificial Intelligence Applications

Data Science

Deep Learning

PyTorch

TensorFlow

CUDA: Compute Unified Device Architecture

Every 3rd Tuesday of the month until November 26, 2025

Chris

ANH CHAU Le

Kathryn N

hüseyin susever

Samuel Bourque

Shitian Ni

Max Jackowski

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

Online event

このイベントをシェアする

NVIDIA Dynamo + Disaggregated Prefill-Decode LLM Serving + CUDA Optimizations

詳細