About us
A monthly meetup for engineers curious about how LLMs actually run in production.
Whether you are just getting started with vLLM or you are optimising CUDA kernels for fun, you are welcome here :)
We dig into the systems-level work behind fast, cheap AI inference: GPU architecture, KV cache management, benchmarking, and everything in between.
Format is TBC, but the point is to learn and connect !
Upcoming events
No upcoming events
