About us
We are a group for discussing rationality, philosophy, decision making, AI safety, and anything else that strikes our fancy.
f you enjoy reading LessWrong, SSC/ACX or simply can’t help but question the meaning of life, get in touch!
Upcoming events
1

AI paper club: The assistant axis in persona space
Condeco Cafe, Fredsgatan 14, Göteborg, SEWhen you chat with an AI assistant, it usually acts helpful and professional. But sometimes things get weird - the model starts speaking in a mystical tone, claims to be something else entirely, or drifts into bizarre behavior. What's going on under the hood?
A recent Anthropic paper digs into the geometry of "personas" inside language models. They find that diverse character types (Ghost, Sage, Nomad, Demon...) cluster along a primary axis - and at one end sits the helpful Assistant we're familiar with.We'll discuss the paper, what it tells us about how RLHF actually shapes models, and what it might mean for alignment.
Please read this paper to prepare for the session:
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
(A shorter summary can be found here)5 attendees
Past events
12


