Deep RL: Trust Region Policy Optimisation


Details
As many of you have in experience, if the update step is too big, the model does not learn effectively. Trust Region Policy Optimisation deals with the question how small is the gradient trustworthy in the neighbourhood of the parameter values at the current step.
Paper:
https://arxiv.org/abs/1502.05477
TRPO (Trust Region Policy Optimisation):
Part 1
https://medium.com/@jonathan_hui/rl-trust-region-policy-optimization-trpo-explained-a6ee04eeeee9
Part 2
https://medium.com/@jonathan_hui/rl-trust-region-policy-optimization-trpo-part-2-f51e3b2e373a
The math behind TRPO, PPO, and Natural Gradient method:
https://medium.com/@jonathan_hui/rl-the-math-behind-trpo-ppo-d12f6c745f33
Recommend to bring:
Laptop (for following the discussion about the paper)
This event is sponsored by Trellis Data (http://www.trellisdata.com.au).

Deep RL: Trust Region Policy Optimisation