Policy Gradient Methods for Reinforcement Learning
Details
SDML Book Club
Policy Gradient Methods
Reinforcement learning is an interesting branch of machine learning with many recent advances. Unlike temporal-difference learning, policy gradient methods do not estimate value functions. This session will cover:
- Review what reinforcement learning is and the notation used in RL
- Define policy gradient methods
- Explain the REINFORCE algorithm
- Introduce actor-critic methods
No prerequisites are required, but people may appreciate being familiar with the introduction to reinforcement learning material, available on our GitHub repo: https://github.com/SanDiegoMachineLearning/bookclub
The majority of the content will be pulled from Reinforcement Learning: An Introduction (second edition) by Richard Sutton and Andrew Bartow. The book isn't the easiest to find right now. The hardcover on Amazon appears to be a knockoff. You may be able to find it elsewhere. You can find free copies of the book online, and one of the places is here: http://incompleteideas.net/book/the-book.html
This session will draw most of its material from chapter 13 of the Sutton & Barto book. Attendees are welcome to either read the chapters before the event and bring questions or discussion items, or use the meetup as a primer and read the chapters afterward. And everyone is also welcome to participate even if they don't plan to do the reading.
=================
Agenda
- 12:00 - 12:15 pm -- Arrival and socializing
- 12:15 - 1:30 pm -- Planning and learning
- Time permitting -- Breakout discussions
Links to chapter notes and videos of prior meetups are available on the SDML GitHub repo https://github.com/SanDiegoMachineLearning/bookclub
=================
Location
This will be an online meetup until further notice.
=================
Questions?
Join our slack channel or leave a comment below if you have any questions about the group or need clarification on anything.
https://join.slack.com/t/sdmachinelearning/shared_invite/zt-6b0ojqdz-9bG7tyJMddVHZ3Zm9IajJA
