Getting Started with vLLM – Fast, Efficient & Production-Ready LLM Serving
Details
Join us for a beginner-friendly, hands-on session on vLLM, one of the fastest and most efficient open-source LLM serving engines trusted by the industry today.
#### 🔹 What You’ll Learn
1. Why vLLM?
- Limitations of traditional LLM serving
- Introduction to PagedAttention
- How vLLM improves throughput, latency & memory efficiency
2. vLLM Architecture (Beginner Friendly)
- Core components
- Execution model
- Scheduling, memory management & token streaming
- Integration with HuggingFace, OpenAI-compatible API, and GPUs
3. Key Benefits
- High performance at lower cost
- Production-grade serving with minimal setup
- Scalability & multi-model hosting
- Extensibility for custom LLM workflows
4. Use Cases
- Chatbots & RAG workflows
- Enterprise GenAI applications
- Model benchmarking and fine-tuned model serving
- Multi-tenant AI platforms and AI gateways
5. Live Demo
- Running your first model on vLLM
- Deploying an LLM with OpenAI-style APIs
- Quick example: RAG + vLLM + embeddings
- Performance comparison vs other serving engines
***
### 🎤 Speakers
Suyog Kale – Founder & CTO, RagnarDataOps
Covering: Why vLLM, core architecture, performance advantages, and practical benefits for organizations.
Ravi Joshi – Technologist Architect
Covering: Live demo, real-world use cases, and deployment walkthrough.
***
### 🤝 Call for Volunteers & Speakers
We welcome volunteers and speakers who want to contribute to future meetups in the open-source AI ecosystem.
***
### 👥 Open Networking
If you are a Senior Data Engineer, Data Scientist, or GenAI Engineer exploring new opportunities, feel free to connect. Happy to network and collaborate!
### 👥Community Partner: Pune AI Community
