Paper Group: Hierarchical Reasoning Model


Details
Join us for a paper discussion on "Hierarchical Reasoning Model"
Exploring brain-inspired recurrent architectures for deep computational reasoning without chain-of-thought supervision
Featured Paper:
"Hierarchical Reasoning Model" (Wang et al., 2025)
arXiv Paper | Code
Discussion Topics:
Brain-Inspired Architecture Design
- Two-module recurrent system: high-level (H) for abstract planning, low-level (L) for detailed computation
- Hierarchical convergence prevents premature RNN settling through temporal separation
- Multi-timescale processing mimics cortical theta (4-8Hz) and gamma (30-100Hz) rhythms
Training Innovations
- One-step gradient approximation eliminates BPTT memory requirements (O(1) vs O(T))
- Deep supervision provides intermediate feedback at each reasoning segment
- Adaptive Computation Time (ACT) with Q-learning determines optimal halt points
Performance Benchmarks
| Task Domain | HRM (27M params) | Baseline Methods | Context Size |
| ----------- | ---------------- | ---------------- | ------------ |
| ARC-AGI Challenge | 40.3% | o3-mini: 34.5% | 30×30 grid |
| Sudoku-Extreme | Near-perfect | CoT: 0% | 9×9 grid |
| Maze-Hard (30×30) | Near-perfect | Direct: 0% | 900 tokens |
Implementation Challenges
- Hierarchical convergence timing between H and L modules (T timesteps per cycle)
- Q-learning stability without replay buffers or target networks
- Memory management for multi-segment deep supervision
Key Technical Features
- Trained from scratch with only 1000 examples per task (no pretraining)
- Turing-complete computational depth through recurrent processing
- Participation ratio analysis shows emergent dimensionality hierarchy (H: 89.95, L: 30.22)
Neuroscientific Validation
- High-level module develops higher-dimensional representations than low-level
- Dimensionality hierarchy matches mouse cortex patterns (ratio ≈ 2.98 vs 2.25)
- Emergent property arising during training, not architectural artifact
Future Directions
- Integration with linear attention mechanisms for long-context efficiency
- Extension to multimodal reasoning tasks
- Causal analysis of dimensionality hierarchy necessity
Silicon Valley Generative AI has two meeting formats:
1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.
2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.
If you would like to be a speaker or suggest a paper email us @ svb.ai.paper.suggestions@gmail.com or join our new discord !!!

Paper Group: Hierarchical Reasoning Model