Reinforcement Learning: Chapter 3 Finite Markov Decision Processes


Details
Last meeting we introduced the agent/environment interface, dynamics equation, and discounted return. This meeting we use those concepts as a basis for defining the value function and how policies lead to value functions in a given environment. From there we will study some examples with concrete states and values and use those to explore how to reach the optimal policy and value function.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course

Every 2 weeks on Monday until March 7, 2026
Reinforcement Learning: Chapter 3 Finite Markov Decision Processes