Zum Inhalt springen

[PDG 450] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Foto von DavidFarago
Hosted By
DavidFarago
[PDG 450] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Details

Link to article: https://arxiv.org/pdf/2505.03335
Title: Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Content: The Absolute Zero paradigm introduces AZR (Absolute Zero Reasoner), a system that generates its own training tasks and improves reasoning abilities without any external data, using a code executor to validate tasks and verify answers as a unified reward source. This approach addresses scalability concerns of current reinforcement learning methods that still depend on human-curated question-answer datasets, even when they avoid direct supervision of reasoning processes. Despite training entirely without external data, AZR achieves state-of-the-art performance on coding and mathematical reasoning benchmarks, outperforming existing zero-setting models that rely on tens of thousands of human examples.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.

In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

Photo of AI Paper Discussion Group Karlsruhe group
AI Paper Discussion Group Karlsruhe
Mehr Events anzeigen
KOSTENLOS