Skip to content

AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Photo of Juliana Eberschlag
Hosted By
Juliana E. and Mario G.
AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Details

Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.

​​​​Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 8:00 - Discussion

Photo of Toronto AI Safety group
Toronto AI Safety
See more events
30 Adelaide East, Industrious Office 12th Floor Common Area
30 Adelaide East, 12th Floor · Toronto, ON
Google map of the user's next upcoming event's location
FREE