What we’re about
We host monthly virtual meetups on machine learning, system design, recsys, etc. Each meetup is a ~30-40 minute talk, followed by a Q&A and then an optional after-party.
Come for the talk 🧑‍🏫, stay for the after-party 🎉
View past meetups and join our discord!
Note: Messages in Meetup are not monitored.
Upcoming events (1)
See all- GPU optimization workshopLink visible for attendees
We’re hosting a workshop on GPU optimization with stellar speakers from OpenAI, NVIDIA, Meta, and Voltron Data.
​The event will happen on Zoom and livestreamed on YouTube. Discussions will happen in parallel on our Discord. A small group of attendees on Zoom can ask the speakers questions directly.
[12:00] Crash course on GPU optimization (Mark Saroufim @ Meta)
Mark is a PyTorch core developer and cofounder of CUDA MODE. He also ran the really fun NeurIPS LLM Efficiency challenge last year. Previously, he was at Graphcore and Microsoft.Mark will give an overview of why GPUs, the metrics that matter, and different GPU programming models (thread-based CUDA and block-based Triton). He promises this will be a painless guide to writing CUDA/Triton kernels! This talk will give us the basics to understand the rest of the workshop.
[12:45] High-performance LLM serving on GPUs (Sharan Chetlur @ NVIDIA)
Sharan is a principal engineer working on TensorRT-LLM at NVIDIA. He’s been working on CUDA since 2012, optimizing the performance of deep learning models from a single GPU to a full data center scale. Previously, he was the Director of Engineering at Cerebras.Sharan will discuss how to build performant, flexible solutions to optimize LLM serving given the rapid evolution of new models and techniques. The talk will cover optimization techniques such as token concatenation, different strategies for batching, and cache.
[13:20] Block-based GPU Programming with Triton (Philippe Tillet @ OpenAI)
Philippe is currently leading the Triton team at OpenAI. Previously, he was at pretty much all major chip makers including NVIDIA, AMD, Intel, and Nervana.Philippe will explain how Triton works and how its block-based programming model differs from the traditional single instruction, multiple threads (SIMT) programming model that CUDA follows. Triton aims to be higher-level than CUDA while being more expressive (lower-level) than common graph compilers like XLA and Torch-Inductor.
[14:00] Intro to data processing on GPUs (William Malpica @ Voltron Data)
William is a co-founder of Voltron Data and the creator of BlazingSQL. He helped scale Theseus, a GPU-native query engine, to handle 100TB queries!Most people today use GPUs for training and inference. A category of workloads that GPUs excel at but are underutilized for is data processing. William will discuss why large-scale data processing should be done on GPUs instead of CPUs and how different tools like cuDF, RAPIDS, and Theseus leverage GPUs for data processing.