Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Network event
29 attendees from 3 groups hosting

Hosted By
Raj M.

Details
Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
In this session, you’ll learn:
- How to install and configure vLLM step by step
- Best practices for serving models efficiently with dynamic batching and PagedAttention
- How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
- Tips for running vLLM locally and scaling on the cloud
This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

New Jersey Artificial Intelligence Meetup Group
See more events
Online event
Link visible for attendees
Sponsors
Hands-On with vLLM: Fast Inference & Model Serving Made Simple
FREE