Skip to content

Details

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: Optimizing AI Inference for Heterogeneous Clusters by Natalie Serrino, Founder @ Gimlet Labs
This talk will cover the performance benefits and technical challenges of deploying inference workloads across heterogeneous hardware. It's a good fit for agents because agents are inherently heterogeneous, and combining GPUs with SRAM-centric architectures leads to major speedups for the same power envelope. But you also have to figure out how to slice workloads and orchestrate across all of this hardware, make the hardware talk to each other, and develop performant code for each target platform.

Speaker: Natalie Serrino, Founder @ Gimlet Labs (https://www.linkedin.com/in/natalieserrino/ @ https://gimletlabs.ai/)

Talk #2: GPU, PyTorch, and CUDA Performance Optimizations

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

You may also like