Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Name: Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Start: 2025-11-04T18:30:00+05:30
End: 2025-11-04T20:30:00+05:30

Network event

118 attendees from 3 groups hosting

Hosted by Raj M. and sreelatha

Meet the group

Mumbai Artifical Intelligence Group

No reviews yet

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
In this session, you’ll learn:

How to install and configure vLLM step by step
Best practices for serving models efficiently with dynamic batching and PagedAttention
How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
Tips for running vLLM locally and scaling on the cloud

This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Artificial Intelligence

Machine Learning

Cloud Computing

Data Analytics

Data Science

Mumbai Artifical Intelligence Group

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Mumbai Artifical Intelligence Group

Details

Sponsors

Members are also interested in