Name: RL Work Sessions
Start: 2026-02-25T18:30:00-08:00
End: 2026-02-25T19:30:00-08:00

RL Working Group:

[https://forms.gle/YQtee41jxzRUjXKy8](https://forms.gle/YQtee41jxzRUjXKy8)

Participants collaborate with others. Projects range from homework assignments to reimplementation of papers. This isn't a class. There is some minimal background you will need to be able to contribute. [Register](https://forms.gle/MYXD9uSpbU6gcncB6)

**[Proposal](https://github.com/dougc333/amd_imitationlearning): Imitation learning to improve BrowserGym leaderboard benchmarks for open source models.**
500k/year relevant job [Jobs:](https://x.com/adcock_brett/status/2018417226895028414)

1. colab ollama A100+
2. verifiable rewards codex search for generating agent code. The definition of agents has changed dramatically in the last month after openclaw and codex. Don't ship a model with weights, ship a cloud instance running self verifiable code to auto fix bugs as they come up.

Looking for projects? The class websites are good starting points.

We started here a year ago:
[Coursera RL](https://communityinviter.com/apps/vllm-dev/join-vllm-developers-slack)
Current techniques for RL:
[Kevin Murphy's RL Notes](https://x.com/sirbayes/status/1996057579756069105)

[MultiAgent systems](https://x.com/guohao_li/status/2010899322825744745) are the next step in LLM applications. [version](https://github.com/eigent-ai/eigent)

[cs234 Spring 2024 YT Videos](https://www.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEbuOSosZdX)
[cs224r Deep Reinforcement Learning Class website](https://cs224r.stanford.edu/)
Create agent apps using [web actions](https://webarena.dev/)

[cs224r YT Videos](https://www.youtube.com/watch?v=EvHRQhMX7_w&list=PLoROMvodv4rPwxE0ONYRa_itZFdaKCylL)
There are a couple hundred projects at the cs224r website. Practice here with the same format for your projects. You have the luxury of additional time.

1. Build some protos to get proof of concept and feasibility
2. Talk w Professor Huang and see if what you are going to do makes sense.
3. Fill out a proposal with AMD for gpu cluster time.
4. [https://cs224r.stanford.edu/material/CS224R_Custom_Project_Guidelines.pdf](https://cs224r.stanford.edu/material/CS224R_Custom_Project_Guidelines.pdf)
5. Overleaf cs224r project template: https://drive.google.com/file/d/1TdXav51fMSQPjT83Ajdz3ZRRMB6xnhjB/view

[cs224r projects](https://cs224r.stanford.edu/projects/cs224r_final_projects.html)
[vLLM Github](https://github.com/vllm-project/vllm)
[vLLm OH](https://www.youtube.com/watch?v=-5n9_IxkLxo); you can ask questions here
[vLLm slack channel](https://communityinviter.com/apps/vllm-dev/join-vllm-developers-slack); you will have to answer a basic technical question to get in. No, we don't give you the answer.
[vLLM production stack](https://github.com/vllm-project/production-stack);
[nanovllm](https://github.com/GeeeekExplorer/nano-vllm) for learning:
vLLM is ok for non-distributed models
If you need distributed; [SGLANG](https://github.com/sgl-project/sglang)
[miniSGLANG](https://github.com/sgl-project/mini-sglang) for learning

[Free GPU Time sponsored by AMD](https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html)
They give everyone $100 free no questions asked. Additional time available after [project approval](https://anchor.digitalocean.com/amd-cloud-free-credit.html)

[Professor OH on RL and Computer Vision](https://jbhuang0604.github.io/)
[Professor Huang OH](https://jbhuang0604.github.io/#open-office-hour)

doug chang

Jonathan Oakey

Silicon Valley Hands On Programming Events

Technology

Computer Programming

Mobile Technology

Mobile Development

Cloud Computing

Amazon Web Services

Hadoop

Big Data

SaaS (Software as a Service)

MapReduce

iOS Development

Distributed Systems

Jaime Oscar Barrial Castillo

Jessica

Roman V Shaposhnik

Ian Shields

Geoffrey Clapp

Jatinder Singh

Misha

RL Work Sessions

Online event

Share

Silicon Valley Hands On Programming Events

RL Work Sessions

Silicon Valley Hands On Programming Events

Details

You may also like