The Current State of Interpretability and Ideas for Scaling Up

Name: The Current State of Interpretability and Ideas for Scaling Up
Start: 2024-12-19T10:00:00-08:00
End: 2024-12-19T11:00:00-08:00

Hosted By

Sophia A.

The Current State of Interpretability and Ideas for Scaling Up

Details

Interpretability has delivered tools that researchers can use to predict, control, and understand the behavior of deep learning models in limited domains.

Now is the time to automate and scale these methods in order to provide a more comprehensive understanding of general purpose capabilities. But the current paradigm of sparse autoencoders fails to make good on the tools and theories from causality that are key for mechanistic understanding.

BuzzRobot guest, Atticus Geiger, a Stanford graduate who is currently leading a nonprofit interpretability research lab, the Pr(Ai)²R Group, argues for an alternative route that leverages interventional data (i.e., hidden representations after an intervention has been performed) to scale the task of controlling and understanding a deep learning model.

Join BuzzRobot Slack to connect with the community

Events in Artificial Intelligence Deep Learning

Deep Reinforcement Learning Machine Learning Data Science

BuzzRobot

See more events

BuzzRobot

Online event

This event has passed

BuzzRobot

public group

The Current State of Interpretability and Ideas for Scaling Up