Skip to content

High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT

Photo of Chris Fregly
Hosted By
Chris F.
High-Performance AI Agent Inference Optimizations + vLLM vs. SGLang vs. TensorRT

Details

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: LLM Engineers Almanac + GPU Glossary + Inference Benchmarks for vLLM, SGLang, and TensorRT + Inference Optimizations by Charles Frye @ Modal
Just as applications rely on SQL engines to store and query structured data, modern LLM deployments need “LLM engines” to manage weight caches, batch scheduling, and hardware-accelerated matrix operations. A recent survey of 25 open-source and commercial inference engines highlights rapid gains in usability and performance, demonstrating that the software stack now meets the baseline quality for cost-effective, self-hosted LLM inference arxiv.org. Tools like Modal’s LLM Engine Advisor further streamline adoption by benchmarking throughput and latency across configurations, offering engineers ready-to-use code snippets for deployment on serverless cloud infrastructure.

https://modal.com/llm-almanac/advisor

Talk #2: High-Performance Agentic AI Inference Systems by Chris Fregly
High-performance LLM inference is critical for mass adoption of AI agents. In this talk, I will demonstrate how to capture the full capabilities of today’s GPU hardware using highly-tuned inference compute like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Drawing on recent breakthroughs, I'll show how co-designing software with cutting-edge hardware can address the scaling challenges of the ultra-scale inference environments required by AI agents. This talk is from Chris' upcoming book called AI Systems Performance Engineering: Optimizing GPUs, CUDA, and PyTorch.

https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/

O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/

YouTube: https://www.youtube.com/@AIPerformanceEngineering

Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

Photo of AI Performance Engineering Meetup (Dubai) group
AI Performance Engineering Meetup (Dubai)
See more events

Every 3rd Tuesday of the month until October 20, 2025

Online event
Link visible for attendees
FREE