AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Hosted By
Juliana E. and Mario G.

Details
Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.
Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 8:00 - Discussion
If you can't make it in person, feel free to join the live stream at 6:30 pm, via this link.

Toronto AI Safety
See more events
30 Adelaide East, Industrious Office 12th Floor Common Area
30 Adelaide East, 12th Floor · Toronto, ON
AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats