Skip to content

Details

Join us for a beginner-friendly, hands-on session on vLLM, one of the fastest and most efficient open-source LLM serving engines trusted by the industry today.

#### 🔹 What You’ll Learn

1. Why vLLM?

  • Limitations of traditional LLM serving
  • Introduction to PagedAttention
  • How vLLM improves throughput, latency & memory efficiency

2. vLLM Architecture (Beginner Friendly)

  • Core components
  • Execution model
  • Scheduling, memory management & token streaming
  • Integration with HuggingFace, OpenAI-compatible API, and GPUs

3. Key Benefits

  • High performance at lower cost
  • Production-grade serving with minimal setup
  • Scalability & multi-model hosting
  • Extensibility for custom LLM workflows

4. Use Cases

  • Chatbots & RAG workflows
  • Enterprise GenAI applications
  • Model benchmarking and fine-tuned model serving
  • Multi-tenant AI platforms and AI gateways

5. Live Demo

  • Running your first model on vLLM
  • Deploying an LLM with OpenAI-style APIs
  • Quick example: RAG + vLLM + embeddings
  • Performance comparison vs other serving engines

***

### 🎤 Speakers

Suyog Kale – Founder & CTO, RagnarDataOps
Covering: Why vLLM, core architecture, performance advantages, and practical benefits for organizations.
Ravi Joshi – Technologist Architect
Covering: Live demo, real-world use cases, and deployment walkthrough.

***

### 🤝 Call for Volunteers & Speakers

We welcome volunteers and speakers who want to contribute to future meetups in the open-source AI ecosystem.

***

### 👥 Open Networking

If you are a Senior Data Engineer, Data Scientist, or GenAI Engineer exploring new opportunities, feel free to connect. Happy to network and collaborate!

### 👥Community Partner: Pune AI Community

Members are also interested in