Skip to content

Details

Join this live session to explore what drives performance in modern AI systems with experts from Microsoft and NVIDIA. We’ll break down how latency and throughput shape responsiveness in large language models and share techniques you can use to improve performance. Learn how hardware, batching, model size, and inference optimizations affect system efficiency, and see benchmarking in action across different configurations in Azure. Discover how to unlock new levels of model performance through advanced infrastructure from Azure and NVIDIA.

📌 This episode is a part of a series. Learn more here!

Sponsors

Sponsor logo
Microsoft Reactor YouTube
Watch past Microsoft Reactor events on-demand anytime
Sponsor logo
Microsoft Learn AI Hub
Learning hub for all things AI
Sponsor logo
Microsoft Copilot Hub
Learning hub for all things Copilot
Sponsor logo
Microsoft Reactor LinkedIn
Follow Microsoft Reactor on LinkedIn

Members are also interested in