The surprising efficiency of recurrent reasoning models
Details
The MLAI Meetup is a community for AI researchers and professionals which hosts monthly talks on exciting research. Our format is:
- 6:00 - 6:20: Socializing
- 6:20 - 6:40: Announcements and AI news
- 6:40 - 7:40: Talk(s) and Q&A
- 7:40 - 8:00 Networking
- 8:00: Head to the nearest pub for dinner
Long Dang & David Rawlinson: "The surprising efficiency of recurrent reasoning models"
Abstract: Large Language models (LLMs) still struggle with reasoning problems, defined as devising and executing complex, goal-oriented action sequences. Current solutions, such as Chain-of-Thought (CoT) and Test-Time Compute (TTC) techniques, can suffer from brittle task decomposition. In addition, auto-regressive output generation is prone to errors, which usually cannot be rectified.
In 2025 the Hierarchical Reasoning Model (HRM) was introduced by Wang et al (1). On 3 reasoning problems (Extreme-Sudoku, Maze navigation, and ARC-AGI tasks) HRM demonstrated performance comparable to large, pre-trained LLMs with orders of magnitude fewer trainable parameters and no pre-training. HRM uses a process of repeated recurrent convergence between two modules to produce a latent representing a problem solution.
Shortly after, Jolicoeur-Martineau released a pre-print (2) describing a thorough ablation of the ideas in HRM and her derivative model, known as the Tiny Recursive Model (TRM). TRM uses a similar recursive convergence process, even fewer parameters, and yet obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.
Finally, Dang and Rawlinson’s 2025 preprint explores HRM as a reinforcement learning Agent, allowing it to be applied to dynamic, uncertain or partially observable reasoning problems, or where the “correct” action is undefined (HRM and TRM use supervised learning). They demonstrate that computation from previous environment time-steps can be re-used during execution of a plan, crucial to efficiency and continuity of thought.
References:
1- Hierarchical Reasoning Model
by Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori (2025)
https://arxiv.org/abs/2506.21734
2- Less is More: Recursive Reasoning with Tiny Networks
by Alexia Jolicoeur-Martineau (2025)
https://arxiv.org/abs/2510.04871
3- HRM-Agent: Training a recurrent reasoning model in dynamic environments using reinforcement learning
by Long H Dang and David Rawlinson (2025)
https://arxiv.org/abs/2510.22832
Speaker bios:
Long Dang is currently working as Data Scientist in WSP after graduating with a Bachelor in Computer science from Monash University. He is interested in understanding how neural networks work and how we can do interesting things with them. His current hobbies are learning Japanese and watching youtube.
David Rawlinson has worked in ML and AI R&D for over 25 years. He has a BSc in Computer Science and AI from Sussex University in the UK and a PhD in robotics & computer vision from Monash University in Melbourne. Currently, he works as a Principal Data Scientist for WSP, an engineering consulting company. He also maintains Causal Wizard ( https://causalwizard.app ) a ML software application to help people apply causal inference to their data.
