Guided Speculative Inference for Efficient Test-Time Alignment of LLMs [ONLINE]

![Guided Speculative Inference for Efficient Test-Time Alignment of LLMs [ONLINE]](https://secure.meetupstatic.com/photos/event/3/6/5/highres_529080869.webp?w=750)
Details
Deriving compute-efficient methods for steering LLMs toward high-reward outputs at inference time is an important line of research in test-time scaling. In this talk, I’ll introduce Guided Speculative Inference (GSI), a new algorithm that uses speculative drafts from a small auxiliary model and a reward-likelihood tilt to provably approximate the optimal reward-regularized policy of a larger model. I will begin by motivating test-time scaling and reviewing prior approaches like soft best-of-n and reward-guided speculative decoding. Then I’ll describe the GSI algorithm, its theoretical guarantees, and its strong empirical gains on reasoning benchmarks.
Jonathan Geuter is a PhD student in Applied Mathematics at Harvard University, and previously obtained a Bachelor's and Master's degree in Mathematics from TU Berlin.
His research interests lie in statistical machine learning, optimal transport, generative modelling, LLMs, and test-time scaling methods.

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs [ONLINE]