Skip to content

Details

Zoom link: https://us02web.zoom.us/j/82308186562

Talk #0: Introductions and Meetup Updates
by Chris Fregly and Antje Barth

Talk #1: NVIDIA GTC 2026 AI Conference Recap by Chris Fregly

In this talk, Chris will present the AI and systems highlights from the NVIDIA GTC 2026 conference (happening the prior week.)

Conference registration link:
https://www.nvidia.com/gtc/ (Use code GTC26-20 for 20% off!)

Talk #2: Evolution and Deep Dive into Flash Attention (v1-v4) for Transformers on NVIDIA GPUs by Seth Weidman @ Sentilink and Author of "Deep Learning from Scratch" @ O'Reilly

In this talk, Seth will break down the evolution of Flash Attention, an optimized and mechanically-sympathetic implementation of the attention mechanism which is fundamental to a the Transformer architecture in modern LLMs.

Related links:

Blog: https://modal.com/blog/reverse-engineer-flash-attention-4
Github: https://github.com/Dao-AILab/flash-attention
Arxiv paper: https://arxiv.org/abs/2205.14135

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Github Repo: http://github.com/cfregly/ai-performance-engineering/
O'Reilly Book: https://www.amazon.com/Systems-Performance-Engineering-Optimizing-Algorithms/dp/B0F47689K8/
YouTube: https://www.youtube.com/@AIPerformanceEngineering
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm

AI summary

By Meetup

Online meetup for AI engineers; learn NVIDIA GTC highlights and Flash Attention v1–v4 optimizations for Transformers on NVIDIA GPUs.

You may also like