Name: Universal and Transferable Attacks on Aligned Language Models
Start: 2023-10-19T10:00:00-07:00
End: 2023-10-19T12:00:00-07:00

In this talk, Andy Zou will share the findings of a research paper on Universal and Transferable Attacks on Aligned Language Models. He will share about a simple and effective attack method that causes aligned language models to generate objectionable behaviors.

Sophia Aryan

BuzzRobot

Technology

Artificial Intelligence

Machine Learning

Data Science

Deep Learning

Data Science using Python

Deep Reinforcement Learning

Machine Learning with Python

Computer Programming

Programming Languages

Open Source Python

JavaScript

Golang

Open Source

Andy Zou is a PhD student in the Computer Science Department at CMU, advised by Zico Kolter and Matt Fredrikson. He is also the cofounder of the Center for AI Safety ([safe.ai](http://safe.ai/)).

Andy Zou

Universal and Transferable Attacks on Aligned Language Models

Online event

Share

BuzzRobot

Universal and Transferable Attacks on Aligned Language Models

BuzzRobot

Details

Related topics

You may also like