Skip to content

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
​In this session, you’ll learn:

  • ​How to install and configure vLLM step by step
  • ​Best practices for serving models efficiently with dynamic batching and PagedAttention
  • ​How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
  • ​Tips for running vLLM locally and scaling on the cloud

​This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
​🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

AI Algorithms
Artificial Intelligence
Data Analytics
Data Science
Information Technology

Members are also interested in