[PDG 463] Critique-Coder: Enhancing Coder Models by Critique RL
Details
Link to article: https://arxiv.org/pdf/2509.22824
Title: Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Track: coding LLMs / RL
Content: This paper introduces Critique Reinforcement Learning (CRL), a training method where models generate critiques of solutions and receive rewards based on whether their judgments match ground-truth labels, which is combined with standard RL in a hybrid approach called CRITIQUE-CODER. The resulting models consistently outperform RL-only baselines across multiple benchmarks, with their 8B model achieving over 60% on LiveCodeBench and demonstrating that training on coding critiques enhances transferable reasoning abilities across diverse tasks.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.
In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.
