[Guest] Domenic Rosati - actually defending against harmful finetuning

Name: [Guest] Domenic Rosati - actually defending against harmful finetuning
Start: 2024-07-18T19:00:00-04:00
End: 2024-07-18T21:00:00-04:00
Location: 100 University Ave

Hosted By

Giles and Mario G.

[Guest] Domenic Rosati - actually defending against harmful finetuning

Details

Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H.

We welcome back Domenic Rosati. Last time we heard about the problem of harmful finetuning, and a specification for what it might mean for a model to be resistant to it. Now Domenic's back with a new paper explaining how to actually defend against these attacks. Should be good news for anyone wanting to release a language model openly while locking down harmful capabilities.

May be somewhat technical!

We welcome a variety of backgrounds, opinions and experience levels.

Events in Toronto, ON Risk Management New Technology

Safety Critical Thinking Artificial Intelligence Applications