NVIDIA GTC 2026 Conf Recap + Evolution of Flash Attention v1-v4 Optimizations
Details
Zoom link: https://us02web.zoom.us/j/82308186562
Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth
Talk #1: NVIDIA GTC 2026 AI Conference Recap by Chris Fregly
In this talk, Chris will present the AI and systems highlights from the NVIDIA GTC 2026 conference (happening the prior week.)
Conference registration link:
https://www.nvidia.com/gtc/ (Use code GTC26-20 for 20% off!)
Talk #2: Evolution and Deep Dive into Flash Attention (v1-v4) for Transformers on NVIDIA GPUs by Seth Weidman @ Sentilink and Author of "Deep Learning from Scratch" @ O'Reilly
In this talk, Seth will break down the evolution of Flash Attention, an optimized and mechanically-sympathetic implementation of the attention mechanism which is fundamental to a the Transformer architecture in modern LLMs.
Related links:
Blog: https://modal.com/blog/reverse-engineer-flash-attention-4
Github: https://github.com/Dao-AILab/flash-attention
Arxiv paper: https://arxiv.org/abs/2205.14135
Zoom link: https://us02web.zoom.us/j/82308186562
Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
AI summary
By Meetup
Online meetup for AI engineers; learn NVIDIA GTC highlights and Flash Attention v1–v4 optimizations for Transformers on NVIDIA GPUs.
AI summary
By Meetup
Online meetup for AI engineers; learn NVIDIA GTC highlights and Flash Attention v1–v4 optimizations for Transformers on NVIDIA GPUs.
