Hierarchical Reasoning Model + Beyond the Black Box of Transformers | 2Talks
Details
This will be a journal club event
Two Talks:
1. Hierarchical Reasoning Model (link to paper)
2. Beyond the Black Box: Transformers as Interacting Particle Systems (link to paper)
Speakers
- Jason Gauci, Chief Scientist at Circuit
- Connor Favreau, Principal Data Scientist at Central Health
Abstracts
1. Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.
2. Are LLMs and Transformers really black boxes? This talk explores a growing body of work that formalizes their internal dynamics, revealing strong mathematical parallels with interacting particle systems. In this perspective, words act as particles moving through semantic space, evolving over time with each transformer block. Under certain architectural constraints, the trajectories of these particles obey a continuity equation — leading to monotonic “energy-like” functionals. This framework helps explain clustering behavior observed in deeper layers and highlights intriguing connections between attention kernels and opinion dynamics in social physics.
We will follow A Mathematical Perspective on Transformers (https://arxiv.org/pdf/2312.10794), while also touching on results from related work:
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation (https://arxiv.org/pdf/2104.02308)
A Mathematical Theory of Attention (https://arxiv.org/pdf/2007.02876)
Sinkformers: Transformers with Doubly Stochastic Attention (https://arxiv.org/pdf/2110.11773)
Info
Austin Deep Learning Journal Club is group for committed machine learning practitioners and researchers alike. The group typically meets every first Tuesday of each month to discuss research publications. The publications are usually the ones that laid foundation to ML/DL or explore novel promising ideas and are selected by a vote. Participants are expected to read the publications to be able to contribute to discussion and learn from others. This is also a great opportunity to showcase your implementations to get feedback from other experts.
Sponsors:
Thank you to Capital Factory for sponsoring Austin Deep Learning. Capital Factory is the center of gravity for entrepreneurs in Texas. They meet the best entrepreneurs in Texas and introduce them to their first investors,
employees, mentors, and customers. To sign up for a Capital Factory
membership, click here.
Antler: Meet your co-founders, join our global community, and access capital to build and scale your company faster.