MARL Chapter 9.6: Agent Modeling with Neural Networks
Detalles
This meeting will continue the material from Chapter 9 in Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. In section 9.6, we will cover joint-action value methods that require models of the policies for all of the agents in the environment in order to select optimal actions. Neural networks will be trained to minimize the cross entropy loss of an output policy with observed actions from other agents. We can then use those models in conjunctions with a separate neural network based joint-action value model to select optimal actions.
Previously, we avoided learning functions of joint-action values due to the exponentially growing size of that space as the number of agents increases. We will discuss some strategies of mitigating that concern such as sampling joint-action values and selecting neural network architectures that allow more efficient iteration of the action space.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Meetup Links:
Recordings of Previous RL Meetings
Recordings of Previous MARL Meetings
Short RL Tutorials
My exercise solutions and chapter notes for Sutton-Barto
My MARL repository
Kickoff Slides which contain other links
MARL Kickoff Slides
MARL Links:
Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
MARL Summer Course Videos
MARL Slides
Sutton and Barto Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Video lectures from a similar course
