Reinforcement Learning: Policy Self Play and Monte Carlo Tree Search
Details
This meeting will begin to cover Multi-Agent Reinforcement Learning: Foundations and Modern Approaches section 9.8 which covers Policy Self-Play in Zero-Sum games. We will introduce a test environment for zero-sum games and discuss a subset of these in which agents take turns selecting actions. In such games, the environment on any given step is equivalent to an MDP and techniques such as MCTS can be used to search for optimal play. We will also discuss MDP solutions that optimize play against a fixed opponent and compare that solution to self play techniques from MARL.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Meetup Links:
Recordings of Previous RL Meetings
Recordings of Previous MARL Meetings
Short RL Tutorials
My exercise solutions and chapter notes for Sutton-Barto
My MARL repository
Kickoff Slides which contain other links
MARL Kickoff Slides
MARL Links:
Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
MARL Summer Course Videos
MARL Slides
Sutton and Barto Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Video lectures from a similar course
