Skip to content

The Current State of Interpretability and Ideas for Scaling Up

Photo of Sophia Aryan
Hosted By
Sophia A.
The Current State of Interpretability and Ideas for Scaling Up

Details

​Interpretability has delivered tools that researchers can use to predict, control, and understand the behavior of deep learning models in limited domains.

Now is the time to automate and scale these methods in order to provide a more comprehensive understanding of general purpose capabilities. But the current paradigm of sparse autoencoders fails to make good on the tools and theories from causality that are key for mechanistic understanding.

​BuzzRobot guest, Atticus Geiger, a Stanford graduate who is currently leading a nonprofit interpretability research lab, the Pr(Ai)²R Group, argues for an alternative route that leverages interventional data (i.e., hidden representations after an intervention has been performed) to scale the task of controlling and understanding a deep learning model.

Join BuzzRobot Slack to connect with the community

Photo of BuzzRobot group
BuzzRobot
See more events
Online event
This event has passed