Universal and Transferable Attacks on Aligned Language Models

Hosted By
Sophia A.

Details
In this talk, Andy Zou will share the findings of a research paper on Universal and Transferable Attacks on Aligned Language Models. He will share about a simple and effective attack method that causes aligned language models to generate objectionable behaviors.

BuzzRobot
See more events
Online event
This event has passed
Universal and Transferable Attacks on Aligned Language Models