#17 Mechanistic interpretability (Part 2 of 2)

Name: #17 Mechanistic interpretability (Part 2 of 2)
Start: 2025-09-29T18:00:00+01:00
End: 2025-09-29T20:00:00+01:00
Location: The Castle Inn

Hosted By

Frank M.

#17 Mechanistic interpretability (Part 2 of 2)

Details

You'll need to read both papers in advance. Ideally please also bring along your own copies of the papers to refer to in the session.

** 29th September 2025: Mechanistic interpretability (2) **

When: Monday 29th September 2025, 6pm – 8 pm.

Where: The Castle Inn, 36 Castle Street, Cambridge, UK. (Most likely we'll be at one of the large tables upstairs.)

Papers:

[1] E. Ameisen et al. Circuit Tracing: Revealing Computational Graphs in Language Models. In Transformer
Circuits Thread, 2025.
URL:
https://transformer-circuits.pub/2025/attribution-graphs/methods.html

[2] J. Lindsey et al. On the Biology of a Large Language Model. In Transformer Circuits Thread, 2025.
URL:
https://transformer-circuits.pub/2025/attribution-graphs/biology.html

In this second of two sessions, we look further into Anthropic's recent attempts to perform a "brain scan" on a large language model and interpret its thoughts. This session will cover the use of cross-layer transcoders to identify neural "circuits", which reveal some of the internal mechanisms by which a frontier AI model (Claude 3.5 Haiku) formulates its responses to user prompts.

Events in Cambridge, GB

Artificial Intelligence