What we'll do
As many of you have in experience, if the update step is too big, the model does not learn effectively. Trust Region Policy Optimisation deals with the question how small is the gradient trustworthy in the neighbourhood of the parameter values at the current step.
TRPO (Trust Region Policy Optimisation):
The math behind TRPO, PPO, and Natural Gradient method:
Recommend to bring:
Laptop (for following the discussion about the paper)
This event is sponsored by Trellis Data (http://www.trellisdata.com.au).