Reinforcement Learning: Chapter 2 Multi-armed Bandits


Details
Last meeting we covered the first four sections of Chapter 2 which introduces the bandit testbed and evaluates a simple epsilon greedy algorithm's performance on it. During this meeting we will continue with the nonstationary problem and constant step size averaging techniques for action values. We will also introduce alternatives to the epsilon greedy method for balancing exploration with exploitation.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course

Every 2 weeks on Monday until March 7, 2026
Reinforcement Learning: Chapter 2 Multi-armed Bandits