Build Enterprise Worthy LLM Inference with Open Source and Kubernetes
Details
Scaling LLMs to production introduces critical challenges: How do you orchestrate multi-node execution?
Optimize GPU scheduling? Achieve low-latency data transfers between nodes? This session reveals how NVIDIA Dynamo and Azure Kubernetes Service solve these challenges together.
Through a real-world e-commerce recommendation scenario, discover how open-source frameworks and managed Kubernetes unlock enterprise-grade inference performance.
Learn to harness advanced hardware like GB200 NVL72, optimize distributed workloads on AKS, and deliver cost-efficient AI at scale to turn infrastructure complexity into competitive advantage.




