Navigating Frontier RL for LLMs: Moving Beyond the Narrow Regime
Details
Bucharest Deep Learning is back with another exciting session! Join us for a deep dive into the realities of scaling Reinforcement Learning for Large Language Models alongside Teodor Poncu, Member of Engineering (Reinforcement Learning) @ poolside, exploring what happens when we push beyond standard training constraints.
The Talk: Navigating Frontier RL: Moving Beyond the Narrow Regime
Current work on RL for LLMs—encompassing algorithms like PPO, GRPO, RLVR, and DAPO—is largely drawn from a narrow, comfortable regime. This typically involves a single domain, small to mid-size models, synchronous loops, clean rewards, and a limited number of steps. But what happens when we scale up?
Frontier RL departs from this standard environment along several critical axes simultaneously, and this presentation explores exactly where those departures actually matter. The talk will cover four key areas of frontier RL: the infrastructure challenges where rollouts dominate iteration time and demand staleness tolerance; the complexities of multi-domain training where heterogeneous tasks make naive batching untenable; the formulation of RL scaling laws as a function of compute; and finally, how to handle the numerical instabilities driven by modern architectures like Mixture of Experts (MoEs).
Logistics:
- Date & Time: Tuesday, June 23 | 18:30 - 19:30
- Location: FMI New Building (Politehnica Business Tower)
- Address: Bulevardul Iuliu Maniu, nr. 15G, Etaj 5, Room 503
