Skip to content

Details

February's book is "AI Systems Performance Engineering"!

This is a casual-style event. Not a structured presentation on topics. Sometimes, the discussion even drifts away from the chapters, but feel free to grab the mic to help steer it back.

Feel free to join the discussion even if you have not read the book chapters! :)

Want to discuss the contents during the reading week? Join the Slack Flyte MLOps Slack group and search for the "ai-reading-club" channel. https://slack.flyte.org/

-------------------------------------------------
About the book:
Title: AI Systems Performance Engineering
Authors: Chris Fregly
Published: November 2025

https://learning.oreilly.com/library/view/ai-systems-performance/9798341627772/

Chapters:
1. Introduction and AI System Overview
2. AI System Hardware Overview
3. OS, Docker, and Kubernetes Tuning for GPU-based Environments
4. Tuning Distributed Networking Communication
5. GPU-Based Storage I/O Optimizations
6. GPU Architecture, CUDA Programming, and Maximizing Occupancy
7. Profiling and Tuning GPU Memory Access Patterns
8. Occupancy Tuning, Warp Efficiency, and Instruction-Level Parallelism
9. Increasing CUDA Kernel Efficiency and Arithmetic Intensity
10. Intra-Kernel Pipelining, Warp Specialization, and Cooperative Thread Block Clusters
11. Inter-Kernel Pipelining, Synchronization, and CUDA Stream-Ordered Memory Allocations
12. Dynamic Scheduling, CUDA Graphs, and Device-Initiated Kernel Orchestration
13. Profiling, Tuning, and Scaling PyTorch
14. PyTorch Compiler, OpenAI Triton, and XLA Backends
15. Multinode Inference, Parallelism, Decoding, and Routing Optimizations
16. Profiling, Debugging, and Tuning Inference at Scale
17. Scaling Disaggregated Prefill and Decode for Inference
18. Advanced Prefill-Decode and KV Cache Tuning
19. Dynamic and Adaptive Inference Engine Optimizations
20. AI-Assisted Performance Optimizations and Scaling Toward Multimillion GPU Clusters

Book Description
Elevate your AI system performance capabilities with this definitive guide to unlocking peak efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering equips professionals with actionable strategies to co-optimize hardware, software, and algorithms for high-performance and cost-effective AI systems. Authored by Chris Fregly, a performance-focused engineering and product leader, this comprehensive resource transforms complex systems into streamlined, high-impact AI solutions.
Inside, you'll discover step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems. You'll also master the art of scaling GPU clusters for high performance, distributed model training jobs, and inference servers.

  • Codesign and optimize hardware, software, and algorithms to achieve maximum throughput and cost savings
  • Implement cutting-edge inference strategies that reduce latency and boost throughput in real-world settings
  • Utilize industry-leading scalability tools and frameworks
  • Profile, diagnose, and eliminate performance bottlenecks across complex AI pipelines
  • Integrate full stack optimization techniques for robust, reliable AI system performance

Whether you're an engineer, researcher, or developer, AI Systems Performance Engineering offers a holistic roadmap for building resilient, scalable, and cost-effective AI systems that excel in both training and inference.

https://learning.oreilly.com/library/view/ai-systems-performance/9798341627772/

AI Algorithms
Artificial Intelligence
Artificial Intelligence Applications
Artificial Intelligence Programming
Machine Learning

Members are also interested in