Our focus is on PhD level Reinforcement Learning and closely related topics (behavioral neuroscience, control theory, DL, ML) and applications of RL in software agents -- especially game playing bots. This includes Atari games which are the de facto standard, but also new games and environments we implement moving beyond what's currently popular.
We are halfway between theoretical (understanding the underlying Math) and applied (implementing software agents ourselves).
Group activities include:
(1) coding up software agents and environments (e.g. games, chatbots) using Python, TensorFlow, Numpy, OpenAI gym, etc. As a specific example, group members have been working on implementing the popular game 2048 (see https://github.com/nickjalbert/improved-funicular ), and using it as a testbed for coding up and trying out classic RL algos as well as more complex out-of-the-box open source algos (e.g. in OpenAI Baselines).
(2) reading recent and classic research papers and textbook chapters on the core RL topics. For example: much of Sutton and Barto's RL textbook, also papers on Policy Gradient, REINFORCE, Sarsa, Temporal Difference, AlphaGo/AlphaZero, DQN, TRPO, PPO, and many more!
(3) sharing resources and taking online classes together to stay on-track in our self-learning (e.g. Sergey Levine's Berkeley Deep RL class; David Silver's UCL Deep RL class, the Berkeley Deep RL bootcamp videos).
(4) learning about advanced topics and recent research in RL including: Hierarchical RL, meta-learning, distributed RL
(5) learning about open source RL frameworks including TF-Agents, Tensorforce, Couch, OpenAI Baselines, etc., and trying them out (e.g. spin up a cluster of GPU machines on a public cloud and run some of the out-of-the-box algos included in these frameworks against our own RL environments).
Anybody is welcome to join, though the focus will be on relatively advanced topics so it will be helpful if you already know:
- ML basics (e.g., you've worked through Andrew Ng's Coursera course, esp gradient descent and related optimization techniques)
- Software engineering in Python, using Notebooks, and basics of autodiff software like TensorFlow or PyTorch
- Calculus and probability (e.g., bayes theorem, law of total expectations, markov chains, inference, importance sampling, etc.)