Reinforcement Learning: Chapter 4 Dynamic Programming


Details
Dynamic programming is a collection of techniques used to solve the Bellman equations for value functions in reinforcement learning. Last chapter, we introduced the value functions and their associated recursive equations. This chapter, we apply the techniques of dynamic programming to calculate solutions to these equations for arbitrary environments. Once we have these solutions, we can easily derive policies which perform optimally in any reinforcement learning environment for which we have complete information. These solution methods are versions of generalized policy iteration which combine the calculation of a value function with a step of policy improvement. The policy improvement theorem is the key idea that justifies the process used to derive optimal policies from value functions, and we prove that theorem in this chapter as well.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course

Every 2 weeks on Monday until March 7, 2026
Reinforcement Learning: Chapter 4 Dynamic Programming