Name: Emergent Misalignment in LLMs
Start: 2025-03-27T19:00:00Z
End: 2025-03-27T21:00:00Z
Location: The Prince George

We'll have a table or two at the pub, and I'll bring a sign so you can see who we are!

We will be discussing the paper *[Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs](https://arxiv.org/pdf/2502.17424)*, by Betley et al.

Abstract snippet: *"In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment."*

If the paper looks interesting to you, whether you have ideas to share or just want to listen, please come along!

Richard J

Brighton AI Safety Reading Group

Technology

AI and Society

AI Algorithms

AI/ML

Alignment

Machine Learning

Artificial Intelligence

Artificial Intelligence Programming

Emergent Misalignment in LLMs

The Prince George

Share this event

Emergent Misalignment in LLMs

Details