Adversarial Defenses for LLMs
Details
This is a ticketed event. Please register at this link.
In his talk, Samuel Simko from ETH Zurich will present his recent work on adversarial defenses for LLMs, developed with the Jinesis Lab (University of Toronto). The talk will cover a series of approaches, ranging from triplet-based contrastive learning defenses to honeypot-style defenses designed to avoid worst-case behavior. He will also discuss patterns observed in contest-winning manual jailbreaking prompts, ideas for tamper-resistant safeguards, and the current limits of attacks, defenses, and evaluation methodologies.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
