Skip to content

Details

This is a ticketed event. You can register here. ​

Could an AI company’s internal coding agents create a “rogue deployment”, a set of agents running without human knowledge or permission? In February and March 2026, METR, the organization behind the time horizons graph, conducted a pilot of a process to assess just that. Anthropic, Google DeepMind, Meta, and OpenAI gave us access to their most capable internal LLMs and a wide range of non-public information. We concluded that, while internal agents plausibly had the means, motive, and opportunity to start small rogue deployments, they didn’t have the means to avoid human detection indefinitely.

METR researcher Thomas Broadley explains the process, the six key facts that informed our conclusion, and how we expect risk to evolve over the next few months.
​You can watch a livestream of the talk here.

Related topics

You may also like