vLLM: Efficient Memory Management for Large Language Model Serving

Name: vLLM: Efficient Memory Management for Large Language Model Serving
Start: 2025-08-01T19:00:00+01:00
End: 2025-08-01T21:00:00+01:00

Hosted By

Giorgio Z.

Details

Join us for an insightful session on vLLM: Efficient Memory Management for Large Language Model Serving Batching, where we dive into cutting-edge techniques to optimize the performance and scalability of large language models in production environments.
This event will explore how vLLM leverages advanced batching strategies and memory management algorithms to significantly reduce latency and increase throughput when serving massive models. Attendees will gain a deep understanding of:

The challenges of serving large language models at scale
Innovative approaches to efficient memory utilization
Batching techniques that maximize hardware efficiency without compromising model accuracy
Practical insights on implementing vLLM in real-world applications

Whether you’re a developer, data scientist, or ML engineer, this session will equip you with the knowledge to enhance your LLM serving pipelines, ensuring faster and more cost-effective deployments.

Events in Deep Learning GPU

Artificial Intelligence Applications Deep Reinforcement Learning

Artificial Intelligence Horizons

See more events

Artificial Intelligence Horizons

public group

Friday, August 1, 2025
7:00 PM to 9:00 PM IST

Needs a location

Artificial Intelligence Horizons

public group

vLLM: Efficient Memory Management for Large Language Model Serving

FREE