Reinforcement Learning: Building an AlphaZero Training Pipeline

Name: Reinforcement Learning: Building an AlphaZero Training Pipeline
Start: 2026-07-06T18:30:00-06:00
End: 2026-07-06T20:00:00-06:00

Hosted by Jason E.

Boulder Data Science, Machine Learning & AI

Details

Last meeting (see recording), we set up an extended tic-tac-toe game environment and showed how the Monte Carlo tree search algorithm defined in the following paper:

Danihelka, I., Guez, A., Schrittwieser, J., & Silver, D. (2022). Policy Improvement by Planning with Gumbel (ICLR 2022). https://openreview.net/forum?id=bERaNdoegnO

can improve an existing policy/value function combination trained with a traditional RL method like actor-critic policy gradient. We demonstrated the tree statistics collected in the regime where the number of simulations is very low and and saw how the simulations are allocated to actions with sequential-halving when the budget is larger. Finally, we compared the tree derived improved policy from the search and demonstrated how it outperforms the original policy in the environment.

This meeting we will continue using the search algorithm to collect tree statistics over the course of episodes and use those statistics to build a dataset. That dataset can then be used to train an improved policy/value function that we can use to get even better performance when used on its own or in combination with search. We will discuss the factors affecting the rate of data generation and how that compares to the training speed. To build a successful algorithm, we will need to balance the resources allocated to each and decide how much data to save in total. Once we have the ability to collect data and train simultaneously, we can demonstrate a training pipeline that masters performance in an MDP environment.

As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.

Meetup Links:
Recordings of Previous RL Meetings
Recordings of Previous MARL Meetings
Short RL Tutorials
My exercise solutions and chapter notes for Sutton-Barto
My MARL repository
Kickoff Slides which contain other links
MARL Kickoff Slides

MARL Links:
Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
MARL Summer Course Videos
MARL Slides

Sutton and Barto Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Video lectures from a similar course

Boulder Data Science, Machine Learning & AI

Reinforcement Learning: Building an AlphaZero Training Pipeline

Boulder Data Science, Machine Learning & AI

Details

Related topics

You may also like