AI Performance Engineering Meetup (San Francisco, Global)

Name: The AI Conference 2025 (In-Person @ Pier 48 Mission Bay, Sept 17-18 15% Off)
Start: 2025-09-17T14:00:00.000Z
End: 2025-09-19T02:00:00.000Z

San Francisco, CA, US

15,804 members · Public group

Organized by Chris Fregly and 1 other

What we’re about

This meetup is focused on AI Performance Engineering.

Upcoming events (4+)

See all

Mon, Jul 21, 2025, 4:00 PM UTCDynamic/Adaptive RL-based Inference Tuning + Accelerated PyTorch with Mojo/MAX
Link visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: Building Accelerated PyTorch Operations with Mojo and the MAX runtime by Ehsan Kermani @ Modular (the Mojo folks)

Ehsan will dive deep into the Mojo interfaces that enables developers to write PyTorch custom ops directly in Mojo. He’ll walk through how the interfaces work, show examples like a Mojo-accelerated Deep learning model such as Whisper and explain how this opens the door to integrating MAX and Mojo into existing PyTorch workflows.

Talk #2: Dynamic and Adaptive AI Inference Serving Optimization Strategies with CUDA and vLLM by Chris Fregly, Author of AI Systems Performance Engineering

Ultra-large language model (LLM) inference on modern hardware requires dynamic runtime adaptation to achieve both high throughput and low latency under varying conditions. A static “one-size-fits-all” approach to model-serving optimizations is no longer sufficient.

Instead, state-of-the-art model serving systems use adaptive strategies that adjust parallelism, numerical precision, CUDA-kernel scheduling, and memory usage on the fly. This talk explores these advanced techniques including dynamic parallelism switching, precision scaling, real-time cache management, and reinforcement learning (RL)-based tuning.

By the end of this talk, you will understand best practices for ultra-scale LLM inference. You will learn how to orchestrate an inference engine that monitors its own performance and adapts in real time to maximize efficiency.

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
75 attendees+70
Mon, Aug 18, 2025, 4:00 PM UTCGPU, CUDA, and PyTorch Performance Optimizations
Link visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: GPU, PyTorch, and CUDA Performance Optimizations

Talk #2: GPU, PyTorch, and CUDA Performance Optimizations

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
38 attendees+33
Mon, Sep 15, 2025, 4:00 PM UTCGPU, CUDA, and PyTorch Performance Optimizations
Link visible for attendees
Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: GPU, PyTorch, and CUDA Performance Optimizations

Talk #2: GPU, PyTorch, and CUDA Performance Optimizations

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
18 attendees+13
Wed, Sep 17, 2025, 7:00 AM PDTThe AI Conference 2025 (In-Person @ Pier 48 Mission Bay, Sept 17-18 15% Off)
Needs location
RSVP with 15% discount code using Fregly25 at https://aiconference.com/#tickets

Join us for the most anticipated AI event of the year and share two days with the brightest minds in AI.
The AI Conference 2025 is an in-person event scheduled for Wednesday, September 17th and Thursday, September 18th at Pier 48 in Mission Bay, San Francisco. Your admission covers both knowledge-filled days with amazing opportunities to learn, connect and build with this vibrant community.

2 Days | 100+ Speakers | 4 Tracks

85+ Top AI Companies Exhibiting

The newest agentic, robotic, and frontier AI technology

Deep industry-specific talks on Applied AI

Live AI-tech Startup Competition with VC Judges

Meaningful Networking & 6 Months App Access

AI After Dark Networking Mixer + Expo Booth Crawl

The AI Conference Hack Day

RSVP with 15% discount code using Fregly25 at https://aiconference.com/#tickets
11 attendees+6