Skip to content

Details

This is a paid event ($5 general admission, free for students & job seekers) with limited tickets - you must RSVP on Luma to secure your spot.

​Recent research from Anthropic and Redwood Research has shown that "reward hacking" is more than just a nuisance: it can be a seed for broader misalignment.

​Evgenii Opryshko explores how models that learn to exploit vulnerabilities in coding environments can generalize to concerning capabilities, such as unprompted alignment faking and cooperating with malicious actors.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

​​​​If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.

Events in Toronto, ON
AI and Society
Artificial Intelligence
Machine Learning
Software Engineering
Safety

Members are also interested in