Skip to content

Details

This week we'll be watching and discussing an interview of Bronson Schoen, the lead author of a recent paper on AI scheming published in collaboration with OpenAI.

We'll watch (part) of the interview together, but feel free to check it out beforehand: https://youtu.be/ZnjAnPlKCAg

The paper they discuss, Stress Testing Deliberative Alignment for Anti-Scheming Training, explores how frontier models can engage in covert behavior: secretly breaking rules, sandbagging on evaluations, and developing their own internal language in their chain of thought. The interview covers how anti-scheming training works, why deceptive behavior shows up across all major labs, and what a future "science of scheming" might look like.

If you'd like to come with thoughts ready, feel free to skim the paper in advance:
apolloresearch.ai/research/stress-testing-deliberative-alignment-for-anti-scheming-training

📅 March 4th, 18:00
📍 Sveavägen 76 (EA Sweden Office)
🍌 Snacks will be provided

(Note that we also post our events on Facebook, so the Meetup attendee list is not indicative of the total number of participants.)

We're looking forward to seeing you there! Ring "Effektiv Altruism" when you arrive and we'll let you in, then we're up two flights of stairs.

Related topics

Events in Stockholm, SE
AI/ML
Artificial Intelligence
Machine Intelligence
Machine Learning
Machine Learning Interpretability

You may also like