Name: AI Safety Thursdays: Understanding The Self-Other Overlap Approach
Start: 2025-05-22T18:00:00-04:00
End: 2025-05-22T21:00:00-04:00
Location: 30 Adelaide East, Industrious Office 12th Floor Common Area

​​**Description**
Leo Zovic presents on [a less-explored technique](https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment) that optimizes models to maintain similar internal representations when reasoning about themselves and others.
This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.
​​
**Event Schedule**
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions

Mario Gibney

Juliana Eberschlag

Evgeniy Opryshko

Toronto AI Safety

Technology

Risk Management

New Technology

Safety

Critical Thinking

Artificial Intelligence Applications

AI and Society

Mathematics

Artificial Intelligence Machine Learning Robotics

Artificial Intelligence

Machine Learning

Software Engineering

Machine Learning Interpretability

Deep Learning

AI Safety Thursdays:  Understanding The Self-Other Overlap Approach

30 Adelaide East, Industrious Office 12th Floor Common Area

Share this event

AI Safety Thursdays: Understanding The Self-Other Overlap Approach

Details