RL Working Group and OpenEvolve Demo by Asankhaya Sharma


Details
We have a potential collaboration from Manning publications.
We are using the grokking RL from Miguel Morales. Make your own demos and present in front of the group.
Gridworld demos:
https://colab.research.google.com/gist/dougc333/1e2a97a6589b930ac3ec66e647223417/untitled0.ipynb#scrollTo=cThyggKOCuIV&line=1&uniqifier=1
Individual Project Review
RL: all of the fundamental RL concepts are required for any work with LLMs and Agents. The exploitation/exploration tradeoff is mentioned in the config.yaml files for OpenEvolve. You work on your own projects and use the weekly meetups to motivate progress.
AlphaEvolve from deepmind is an example of how to structure LLM coding agents. These are far more advanced than the cut and paste suggestions in VSCode.
There is a rudimentary opensource version of it called openevolve.
We will review getting this to work.
https://github.com/codelion/openevolve
https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Coursera RL: 4 classes, Fundamentals, Sampling, Approximation, Capstone. Start with this first. We are on Course #2 Sampling and TD(0) under sampling.
Stanford cs234: the derivations are in the YT videos for this class.
You will need these for the derivations and proofs. Practice on the assignments on your own.
https://www.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEbuOSosZdX
POMDPs and RL. Self driving cars, air control systems; most real world systems use POMDPs + RL. These are mentioned briefly in the grokking book but are explained in detail here: https://aa228v.stanford.edu/
The aa228v videos are on YT.
How to build LLMs: https://stanford-cs336.github.io/spring2025/
This includes relevant information in 1 place with better detail than any blog post or YT video with starter code exercises. How to train foundation models, what are MOEs, fine tuning, benchmarking, etc...
Manning RL Resources
https://www.manning.com/books/grokking-deep-reinforcement-learning


Every 1st Wednesday of the month until June 30, 2025
RL Working Group and OpenEvolve Demo by Asankhaya Sharma