Skip to content

AI Safety Thursdays: Understanding The Self-Other Overlap Approach

Photo of Mario Gibney
Hosted By
Mario G. and 2 others
AI Safety Thursdays:  Understanding The Self-Other Overlap Approach

Details

​​Description
Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.
This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.
​​
Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions

Photo of Toronto AI Safety group
Toronto AI Safety
See more events
30 Adelaide East, Industrious Office 12th Floor Common Area
30 Adelaide East, 12th Floor · Toronto, ON
Google map of the user's next upcoming event's location
FREE