RL Work Sessions

Name: RL Work Sessions
Start: 2026-02-11T18:30:00-08:00
End: 2026-02-11T19:30:00-08:00

Hosted by doug c. and Jonathan O.

Silicon Valley Hands On Programming Events

Details

RL Working Group:
We are going to review a few things. These are essential skills before starting a LLM or agentic project beyond 1 GPU.

dashboards and backend code for displaying and creating PPO policy parameter sweeps in single GPU mode. Distributed PPO is a separate topic and will require a different env than Ant/Walker/Hopper. This assumes basic knowledge of React. If you dont have it you can build it quickly with Claude Code iterations to get things working. This is different in 2026 vs pre LLM days.
Large scale data cleaning and acquisition and synthesis. Not sure this is necessary. We have been here since 2007 starting with Hadoop and Roman's series on Bigtop. New is synthesis using LLMs. This is different for each project.
Deploying vLLM to AMD single CPU droplet. Multiple CPUs/GPUs are a different topic. Create a dashboard for billing users. Oddly, billing isn't the default demo but it should be because it touches different skills required for jobs. The helm file production stack for vLLM isn't a good portfolio demo. It doesn't test or set a floor on the basic visualization, backend processing, and cloud API skills required in a job.

Participants collaborate with others. Projects range from homework assignments to reimplementation of papers. This isn't a class. There is some minimal background you will need to be able to contribute. Register

Proposal: Imitation learning to improve BrowserGym leaderboard benchmarks for open source models.

Looking for projects? The class websites are good starting points.

We started here a year ago:
Coursera RL
Current techniques for RL:
Kevin Murphy's RL Notes

MultiAgent systems are the next step in LLM applications. version

cs234 Spring 2024 YT Videos
cs224r Deep Reinforcement Learning Class website
Create agent apps using web actions

cs224r YT Videos
There are a couple hundred projects at the cs224r website. Practice here with the same format for your projects. You have the luxury of additional time.

Build some protos to get proof of concept and feasibility
Talk w Professor Huang and see if what you are going to do makes sense.
Fill out a proposal with AMD for gpu cluster time.
https://cs224r.stanford.edu/material/CS224R_Custom_Project_Guidelines.pdf
Overleaf cs224r project template: https://drive.google.com/file/d/1TdXav51fMSQPjT83Ajdz3ZRRMB6xnhjB/view

cs224r projects
vLLM Github
vLLm OH; you can ask questions here
vLLm slack channel; you will have to answer a basic technical question to get in. No, we don't give you the answer.
vLLM production stack;
nanovllm for learning:
vLLM is ok for non-distributed models
If you need distributed; SGLANG
miniSGLANG for learning

Free GPU Time sponsored by AMD
They give everyone $100 free no questions asked. Additional time available after project approval

Professor OH on RL and Computer Vision
Professor Huang OH

AI summary

By Meetup

RL Work Sessions: a collaborative RL group for learners with some background to produce a proof-of-concept RL prototype.

AI summary

By Meetup

RL Work Sessions: a collaborative RL group for learners with some background to produce a proof-of-concept RL prototype.

RL Work Sessions

Silicon Valley Hands On Programming Events

Details

AI summary

AI summary

You may also like