Universal and Transferable Attacks on Aligned Language Models
Details
In this talk, Andy Zou will share the findings of a research paper on Universal and Transferable Attacks on Aligned Language Models. He will share about a simple and effective attack method that causes aligned language models to generate objectionable behaviors.
Artificial Intelligence
Deep Learning
Machine Learning
Data Science
