Skip to content

Universal and Transferable Attacks on Aligned Language Models

Photo of Sophia Aryan
Hosted By
Sophia A.
Universal and Transferable Attacks on Aligned Language Models

Details

In this talk, Andy Zou will share the findings of a research paper on Universal and Transferable Attacks on Aligned Language Models. He will share about a simple and effective attack method that causes aligned language models to generate objectionable behaviors.

Photo of BuzzRobot group
BuzzRobot
See more events
Online event
This event has passed