Name: Optimizing LLM Inference Requests
Start: 2026-05-16T12:00:00-07:00
End: 2026-05-16T13:30:00-07:00

*Introduction to LLM Inference*
**Optimizing LLM Inference Requests**

***NOTE***: May 16 will ***only* be online with Zoom**

Our new book club series is about LLM Inference. Ted has done a deep dive on how LLM inference works and what are the techniques for optimizing performance. This week we will discuss chapter 6 about optimizing individual requests, including prefix caching and speculative decoding.

A free copy of the *LLM Inference Illustrated* book is available at [https://tedkyi.github.io/llm-inference/](https://tedkyi.github.io/llm-inference/)

Come join us in person or online. *Please make sure to read the instructions for joining the event below.*

**Agenda:**

* 12:00 - 1:15 pm -- Presentation and discussion
* Time permitting -- Additional Q&A, networking

Links to notes/slides and videos of prior meetups are available on the SDML GitHub repo [https://github.com/SanDiegoMachineLearning/bookclub](https://github.com/SanDiegoMachineLearning/bookclub)

**Location:**
Due to conflicts, ***this session will be Zoom-only***

***Please Note:*** There are two steps required to join the online meetup:

* You must go to our Slack community and ask for the password for the meeting. Link to join is below.
* You must have a Zoom login in order to join the event. A free Zoom account will work. If you get an error message joining the Zoom, please login to your account on the Zoom website then try again.
* Use this Zoom link: [https://us06web.zoom.us/j/82891977558](https://us06web.zoom.us/j/82891977558)

**Community:**
Join our slack channel for questions and discussion about what's new in ML:
https://join.slack.com/t/sdmachinelearning/shared_invite/zt-34vyls6jn-3cREuo8EoPmo6AKwTEgGgA

Ryan Chesler

Ted Kyi

San Diego Machine Learning

Technology

Artificial Intelligence

Machine Learning

Data Analytics

Data Visualization

Predictive Analytics

Data Science

Natural Language Processing

Deep Learning

Neural Networks

Computer Vision

Predictive Modeling

TensorFlow

PyTorch

Technology Startups

Optimizing LLM Inference Requests

Online event

Share

San Diego Machine Learning

Optimizing LLM Inference Requests

San Diego Machine Learning

Details

Related topics

You may also like