PyTorch ATX: The Future of Inferencing

Name: PyTorch ATX: The Future of Inferencing
Start: 2025-09-17T17:30:00-05:00
End: 2025-09-17T20:30:00-05:00
Location: Capital Factory, Voltron Room (1st floor)

Hosted by Jason M.

Meet the group

PyTorch ATX

No reviews yet

Details

PyTorch ATX is combining with the vLLM community on September 17th for a hands-on look at the next generation of AI inference pipelines. We’ll explore the full modern stack—from aggressive model-size reductions like INT4/INT8 quantization and pruning, dynamic batching, paged-attention memory tricks, and multi-node scheduling. We'll dive into vLLM—today’s most popular open-source engine for high-throughput LLM inference— and then learn how to deploy at larger scale using the llm-d project.

Presenters include:

“Getting started with inference using vLLM” - Steve Watt, PyTorch ambassador
“An intermediate guide to inference using vLLM - PagedAttention, Quantization, Speculative Decoding, Continuous Batching and more” - Luka Govedič, vLLM core committer
“vLLM Semantic Router - Intelligent Auto Reasoning Router for Efficient LLM Inference on Mixture-of-Models” - Huamin Chen, vLLM Semantic Router project creator
“Combining Kubernetes and vLLM to deliver scalable, distributed inference with llm-d” - Greg Pereira, llm-d maintainer

Expect deeply technical talks, live demos, and open Q&A with the engineers building and running these systems.

When: September 17, 2025 - 5:30PM to 8:30PM
Where: Voltron Room - Capital Factory (1st Floor of Omni Hotel) in Austin, TX

Light food and beverages will be provided.

Email pytorchatx@gmail.com for any questions

Events in Austin, TX

Artificial Intelligence

Deep Learning

Machine Learning

Data Science

PyTorch

PyTorch ATX: The Future of Inferencing

PyTorch ATX

Details

Members are also interested in

PyTorch ATX: The Future of Inferencing