
What we’re about
This meetup is focused on AI Performance Engineering including GPUs, CUDA, PyTorch, TensorFlow, Kubernetes, Optimizations, High-Throughput Training Clusters and Low-Latency Inference Clusters.
Upcoming events (4+)
See all- Dynamic/Adaptive RL-based Inference Tuning + Accelerated PyTorch with Mojo/MAXLink visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje BarthTalk #1: Building Accelerated PyTorch Operations with Mojo and the MAX runtime by Ehsan Kermani @ Modular (the Mojo folks)
Ehsan will dive deep into the Mojo interfaces that enables developers to write PyTorch custom ops directly in Mojo. He’ll walk through how the interfaces work, show examples like a Mojo-accelerated Deep learning model such as Whisper and explain how this opens the door to integrating MAX and Mojo into existing PyTorch workflows.
Talk #2: Dynamic and Adaptive AI Inference Serving Optimization Strategies with CUDA and vLLM by Chris Fregly, Author of AI Systems Performance Engineering
Ultra-large language model (LLM) inference on modern hardware requires dynamic runtime adaptation to achieve both high throughput and low latency under varying conditions. A static “one-size-fits-all” approach to model-serving optimizations is no longer sufficient.
Instead, state-of-the-art model serving systems use adaptive strategies that adjust parallelism, numerical precision, CUDA-kernel scheduling, and memory usage on the fly. This talk explores these advanced techniques including dynamic parallelism switching, precision scaling, real-time cache management, and reinforcement learning (RL)-based tuning.
By the end of this talk, you will understand best practices for ultra-scale LLM inference. You will learn how to orchestrate an inference engine that monitors its own performance and adapts in real time to maximize efficiency.
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm - GPU, CUDA, and PyTorch Performance OptimizationsLink visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje BarthTalk #1: GPU, PyTorch, and CUDA Performance Optimizations
Talk #2: GPU, PyTorch, and CUDA Performance Optimizations
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm - GPU, CUDA, and PyTorch Performance OptimizationsLink visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje BarthTalk #1: GPU, PyTorch, and CUDA Performance Optimizations
Talk #2: GPU, PyTorch, and CUDA Performance Optimizations
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm - GPU, CUDA, and PyTorch Performance OptimizationsLink visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje BarthTalk #1: GPU, PyTorch, and CUDA Performance Optimizations
Talk #2: GPU, PyTorch, and CUDA Performance Optimizations
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm