Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Name: Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Start: 2025-09-30T19:00:00-04:00
End: 2025-09-30T21:00:00-04:00

Network event

118 attendees from 3 groups hosting

Hosted by Raj M.

Meet the group

New Jersey Artificial Intelligence Meetup Group

No reviews yet

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
In this session, you’ll learn:

How to install and configure vLLM step by step
Best practices for serving models efficiently with dynamic batching and PagedAttention
How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
Tips for running vLLM locally and scaling on the cloud

This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

AI Algorithms

Artificial Intelligence

Data Analytics

Data Science

Information Technology

New Jersey Artificial Intelligence Meetup Group

Cerebrone AI

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

New Jersey Artificial Intelligence Meetup Group

Details

Sponsors

Cerebrone AI

Members are also interested in