Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Name: Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Start: 2025-09-30T19:00:00-04:00
End: 2025-09-30T21:00:00-04:00

Network event

29 attendees from 3 groups hosting

Hosted By

Raj M.

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
In this session, you’ll learn:

How to install and configure vLLM step by step
Best practices for serving models efficiently with dynamic batching and PagedAttention
How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
Tips for running vLLM locally and scaling on the cloud

This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Events in AI Algorithms Data Analytics

Artificial Intelligence Data Science Information Technology

New Jersey Artificial Intelligence Meetup Group

See more events

New Jersey Artificial Intelligence Meetup Group

Online event

Link visible for attendees

New Jersey Artificial Intelligence Meetup Group

public group

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Details

Sponsors

Cerebrone AI

Sponsors

Cerebrone AI