Skip to content

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Network event
29 attendees from 3 groups hosting
Photo of Raj Marri
Hosted By
Raj M.
Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
​In this session, you’ll learn:

  • ​How to install and configure vLLM step by step
  • ​Best practices for serving models efficiently with dynamic batching and PagedAttention
  • ​How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
  • ​Tips for running vLLM locally and scaling on the cloud

​This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
​🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Photo of New Jersey Artificial Intelligence Meetup Group group
New Jersey Artificial Intelligence Meetup Group
See more events
FREE