#18 Reinforcement learning from human feedback (RLHF)
Details
You'll need to read the paper in advance. Ideally please also bring along your own copies of the paper to refer to in the session.
29th October 2025: Reinforcement learning from human feedback
When: Wednesday 29th October 2025, 6 pm – 8 pm.
Where: The Castle Inn, 36 Castle Street, Cambridge, UK. (Most
likely we'll be at one of the large tables upstairs.)
Paper: P. F. Christiano et al. Deep Reinforcement Learning
from Human Preferences. In Advances in Neural Information
Processing Systems 30, 2017.
URL: https://arxiv.org/abs/1706.03741
We'll discuss Reinforcement Learning from Human Feedback (RLHF), a foundational technique for converting a deep learning model from a mere pattern predictor into an agent that can act according to the wishes of a human operator.
