Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries
Details
Bucharest Deep Learning is back with another exciting session! Join us for a deep dive into Reasoning and RLVR benchmarking alongside Marius Dragoi, presenting his recent paper on evaluating the true limits of LLMs.
The Talk: Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm to improve Large Language Models on reasoning tasks. However, assessing the actual reasoning boundary of these models relies heavily on the Pass@k metric, which can be highly misleading. Given a large number of trials, Pass@k can produce correct answers due to random guessing.
The paper introduces Cover@τ, a novel metric that better measures the improvement given by RLVR. Cover@tau highlights a clear trade-off between the variaty of problems solved and the reliability of the models problem solving. This new approach reveals a different ranking of popular RLVR algorithms and provides a much more accurate perspective on true reasoning boundaries.
Logistics:
- Date & Time: Tuesday, April 21 | 18:30 - 19:30
- Location: FMI New Building (Politehnica Business Tower)
- Address: Bulevardul Iuliu Maniu, nr. 15G, Etaj 5, Room 503
