[PDG 458] Energy-Based Transformers are Scalable Learners and Thinkers
Details
Link to article: https://arxiv.org/pdf/2507.02092
Title: Energy-Based Transformers are Scalable Learners and Thinkers
Content: This paper introduces Energy-Based Transformers (EBTs), a new model architecture that enables System 2 Thinking (inference-time reasoning) to emerge from unsupervised learning alone, making it both modality and problem agnostic by learning to verify input-prediction compatibility and using energy minimization for predictions. EBTs demonstrate superior scaling during training (up to 35% faster) and provide 29% greater performance improvements during inference with additional computation compared to standard Transformers, while also generalizing better on out-of-distribution data across both text and image tasks.
Slack link: ml-ka.slack.com, channel: #pdg. Please join us -- if you cannot join, please message us here or to mlpaperdiscussiongroupka@gmail.com.
In the Paper Discussion Group (PDG) we discuss recent and fundamental papers in the area of machine learning on a weekly basis. If you are interested, please read the paper beforehand and join us for the discussion. If you have not fully understood the paper, you can still participate – everyone is welcome! You can join the discussion or simply listen in. The discussion is in German or English depending on the participants.