Skip to content

Details

Link to article: https://arxiv.org/pdf/2507.02092
Title: Energy-Based Transformers are Scalable Learners and Thinkers
Content: This paper introduces Energy-Based Transformers (EBTs), a new model architecture that enables System 2 Thinking (inference-time reasoning) to emerge from unsupervised learning alone, making it both modality and problem agnostic by learning to verify input-prediction compatibility and using energy minimization for predictions. EBTs demonstrate superior scaling during training (up to 35% faster) and provide 29% greater performance improvements during inference with additional computation compared to standard Transformers, while also generalizing better on out-of-distribution data across both text and image tasks.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.

In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.

Artificial Intelligence
Deep Learning
Machine Learning
Natural Language Processing
Neural Networks

Members are also interested in