Skip to content

Details

Link to article: https://arxiv.org/pdf/2509.22824
Title: Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Track: coding LLMs / RL
Content: This paper introduces Critique Reinforcement Learning (CRL), a training method where models generate critiques of solutions and receive rewards based on whether their judgments match ground-truth labels, which is combined with standard RL in a hybrid approach called CRITIQUE-CODER. The resulting models consistently outperform RL-only baselines across multiple benchmarks, with their 8B model achieving over 60% on LiveCodeBench and demonstrating that training on coding critiques enhances transferable reasoning abilities across diverse tasks.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.

In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

Related topics

Artificial Intelligence
Deep Learning
Machine Learning
Natural Language Processing
Neural Networks

You may also like