Skip to content

Details

Bucharest Deep Learning is back with another exciting session! Join us for a deep dive into Reasoning and RLVR benchmarking alongside Marius Dragoi, presenting his recent paper on evaluating the true limits of LLMs.

The Talk: Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm to improve Large Language Models on reasoning tasks. However, assessing the actual reasoning boundary of these models relies heavily on the Pass@k metric, which can be highly misleading. Given a large number of trials, Pass@k can produce correct answers due to random guessing.

The paper introduces Cover@τ, a novel metric that better measures the improvement given by RLVR. Cover@tau highlights a clear trade-off between the variaty of problems solved and the reliability of the models problem solving. This new approach reveals a different ranking of popular RLVR algorithms and provides a much more accurate perspective on true reasoning boundaries.

Logistics:

  • Date & Time: Tuesday, April 21 | 18:30 - 19:30
  • Location: FMI New Building (Politehnica Business Tower)
  • Address: Bulevardul Iuliu Maniu, nr. 15G, Etaj 5, Room 503

Related topics

You may also like