Skip to content

Details

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
​In this session, you’ll learn:

  • ​How to install and configure vLLM step by step
  • ​Best practices for serving models efficiently with dynamic batching and PagedAttention
  • ​How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
  • ​Tips for running vLLM locally and scaling on the cloud

​This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
​🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Artificial Intelligence
Machine Learning
Cloud Computing
Data Analytics
Data Science

Sponsors

Sponsor logo
Cerebrone AI
Cerebrone AI provides Gen AI consulting solutions

Members are also interested in