Optimizing LLM Inference Requests
Details
Introduction to LLM Inference
Optimizing LLM Inference Requests
NOTE: May 16 will only be online with Zoom
Our new book club series is about LLM Inference. Ted has done a deep dive on how LLM inference works and what are the techniques for optimizing performance. This week we will discuss chapter 6 about optimizing individual requests, including prefix caching and speculative decoding.
A free copy of the LLM Inference Illustrated book is available at https://tedkyi.github.io/llm-inference/
Come join us in person or online. Please make sure to read the instructions for joining the event below.
Agenda:
- 12:00 - 1:15 pm -- Presentation and discussion
- Time permitting -- Additional Q&A, networking
Links to notes/slides and videos of prior meetups are available on the SDML GitHub repo https://github.com/SanDiegoMachineLearning/bookclub
Location:
Due to conflicts, this session will be Zoom-only
Please Note: There are two steps required to join the online meetup:
- You must go to our Slack community and ask for the password for the meeting. Link to join is below.
- You must have a Zoom login in order to join the event. A free Zoom account will work. If you get an error message joining the Zoom, please login to your account on the Zoom website then try again.
- Use this Zoom link: https://us06web.zoom.us/j/82891977558
Community:
Join our slack channel for questions and discussion about what's new in ML:
https://join.slack.com/t/sdmachinelearning/shared_invite/zt-34vyls6jn-3cREuo8EoPmo6AKwTEgGgA
