About us
Join the vLLM community to discuss optimising LLM inference!
This community meetup group brings together engineers, researchers, and platform practitioners who are interested in high-performance generative AI inference using vLLM. The group focuses on practical discussion of operating and optimising large language model serving systems, including topics such as scalable GPU inference architectures, model serving patterns, batching and scheduling strategies, streaming responses, and integration with enterprise AI platforms.
Participants will explore the broader ecosystem around vLLM, including emerging distributed inference frameworks such as llm-d, and how these technologies enable efficient deployment of modern LLM workloads in production environments. The meetup encourages hands-on knowledge sharing, real-world deployment experiences, performance tuning techniques, and discussion of new capabilities shaping the future of open GenAI inference infrastructure.
Upcoming events
1

