vLLM Inference Meetup - Pune
Details
Inference is no longer just a deployment detail. It is becoming the core systems problem in AI.
As models get larger, workloads get more complex, and real-world expectations move toward low latency, high throughput, and sustainable cost, the conversation is shifting from “can it run?” to “can it serve well at scale?”
That is exactly what this meetup is about.
On 14 March 2026, Red Hat AI, NeevCloud, and HPE are bringing together the vLLM community in Pune for a focused afternoon on modern inference systems, practical engineering lessons, and hands-on exploration.
This is for people building with LLMs in production, working on model serving infrastructure, optimizing GPU utilization, reducing latency, improving token economics, or exploring new patterns like semantic routing and disaggregated serving.
What to expect
• Technical talks grounded in real inference challenges
• Practical discussions on performance, architecture, and serving tradeoffs
• Sessions around vLLM, semantic routing, and production-minded inference design
• A hands-on workshop to go beyond slides and get closer to the system
• Time to connect with engineers, maintainers, and practitioners working on the next wave of inference infrastructure
Agenda:
12:30 PM to 01:00 PM - Registration and opening remarks
01:00 PM to 01:30 PM - Keynote: Why inference matters
01:30 PM to 02:00 PM - vLLM technical introduction
02:00 PM to 02:30 PM - vLLM semantic routing
02:30 PM to 03:00 PM - Break and pizza
03:00 PM to 03:30 PM - Hands-on workshop
03:30 PM to 04:00 PM - Project Sardeenz
04:00 PM to 04:30 PM - Technical session by HPE
04:30 PM to 06:00 PM - Technical session by NeevCloud
What to bring
• Your laptop with SSH installed
GPU instances will be provided by the organizers
• A government-issued photo ID
Required for venue entry
• Questions, curiosity, and a strong interest in how inference systems are evolving
A few important notes
• Registration closes 24 hours before the event
• Unregistered attendees will not be allowed at the venue
• The agenda may slightly evolve as we finalize demos and live discussions
If you care about how AI systems actually serve, scale, and perform in the real world, this meetup will be worth your time. See y'all on the 14th!
