Skip to content

RL Working Group and OpenEvolve Demo by Asankhaya Sharma

Photo of doug chang
Hosted By
doug c. and Anup K.
RL Working Group and OpenEvolve Demo by Asankhaya Sharma

Details

We have a potential collaboration from Manning publications.
We are using the grokking RL from Miguel Morales. Make your own demos and present in front of the group.

Gridworld demos:
https://colab.research.google.com/gist/dougc333/1e2a97a6589b930ac3ec66e647223417/untitled0.ipynb#scrollTo=cThyggKOCuIV&line=1&uniqifier=1

Individual Project Review
RL: all of the fundamental RL concepts are required for any work with LLMs and Agents. The exploitation/exploration tradeoff is mentioned in the config.yaml files for OpenEvolve. You work on your own projects and use the weekly meetups to motivate progress.

AlphaEvolve from deepmind is an example of how to structure LLM coding agents. These are far more advanced than the cut and paste suggestions in VSCode.

There is a rudimentary opensource version of it called openevolve.
We will review getting this to work.
https://github.com/codelion/openevolve

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Coursera RL: 4 classes, Fundamentals, Sampling, Approximation, Capstone. Start with this first. We are on Course #2 Sampling and TD(0) under sampling.

Stanford cs234: the derivations are in the YT videos for this class.
You will need these for the derivations and proofs. Practice on the assignments on your own.
https://www.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEbuOSosZdX

POMDPs and RL. Self driving cars, air control systems; most real world systems use POMDPs + RL. These are mentioned briefly in the grokking book but are explained in detail here: https://aa228v.stanford.edu/
The aa228v videos are on YT.

How to build LLMs: https://stanford-cs336.github.io/spring2025/
This includes relevant information in 1 place with better detail than any blog post or YT video with starter code exercises. How to train foundation models, what are MOEs, fine tuning, benchmarking, etc...

Manning RL Resources
https://www.manning.com/books/grokking-deep-reinforcement-learning

Photo of Silicon Valley Hands On Programming Events group
Silicon Valley Hands On Programming Events
See more events

Every 1st Wednesday of the month until June 30, 2025

Online event
This event has passed