AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Hosted By
Juliana E. and Mario G.

Details
Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.
Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 8:00 - Discussion

Toronto AI Safety
See more events
30 Adelaide East, Industrious Office 12th Floor Common Area
30 Adelaide East, 12th Floor · Toronto, ON
AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats
FREE