RL Work Sessions
Details
RL Working Group:
Participants collaborate with others. Projects range from homework assignments to reimplementation of papers. This isn't a class. There is some minimal background you will need to be able to contribute.
Proposal: Imitation learning to improve BrowserGym leaderboard benchmarks for open source models.
Looking for projects? The class websites are good starting points.
We started here a year ago:
Coursera RL
Current techniques for RL:
Kevin Murphy's RL Notes
MultiAgent systems are the next step in LLM applications. version
cs234 Spring 2024 YT Videos
cs224r Deep Reinforcement Learning Class website
Create agent apps using web actions
cs224r YT Videos
There are a couple hundred projects at the cs224r website. Practice here with the same format for your projects. You have the luxury of additional time.
- Build some protos to get proof of concept and feasibility
- Talk w Professor Huang and see if what you are going to do makes sense.
- Fill out a proposal with AMD for gpu cluster time.
- https://cs224r.stanford.edu/material/CS224R_Custom_Project_Guidelines.pdf
- Overleaf cs224r project template: https://drive.google.com/file/d/1TdXav51fMSQPjT83Ajdz3ZRRMB6xnhjB/view
cs224r projects
vLLM Github
vLLm OH; you can ask questions here
vLLm slack channel; you will have to answer a basic technical question to get in. No, we don't give you the answer.
vLLM production stack;
nanovllm for learning:
vLLM is ok for non-distributed models
If you need distributed; SGLANG
miniSGLANG for learning
Free GPU Time sponsored by AMD
They give everyone $100 free no questions asked. Additional time available after project approval
Professor OH on RL and Computer Vision
Professor Huang OH
AI summary
By Meetup
Invite-only RL working group for developers to demo RL algorithms, agents, and vLLM projects; apply for AMD GPU cloud time and share progress updates to AMD.
AI summary
By Meetup
Invite-only RL working group for developers to demo RL algorithms, agents, and vLLM projects; apply for AMD GPU cloud time and share progress updates to AMD.
