GPU, CUDA, and PyTorch Performance Optimizations
Details
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth
Talk #1: Optimizing AI Inference for Heterogeneous Clusters by Natalie Serrino, Founder @ Gimlet Labs
This talk will cover the performance benefits and technical challenges of deploying inference workloads across heterogeneous hardware. It's a good fit for agents because agents are inherently heterogeneous, and combining GPUs with SRAM-centric architectures leads to major speedups for the same power envelope. But you also have to figure out how to slice workloads and orchestrate across all of this hardware, make the hardware talk to each other, and develop performant code for each target platform.
Speaker: Natalie Serrino, Founder @ Gimlet Labs (https://www.linkedin.com/in/natalieserrino/ @ https://gimletlabs.ai/)
Talk #2: GPU, PyTorch, and CUDA Performance Optimizations
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
