[Guest] Domenic Rosati - actually defending against harmful finetuning
Details
Getting here: Enter the lobby at 100 University Ave (right next to St Andrew subway station), and message Giles Edkins on the meetup app or call him on 647-823-4865 to be let up to room 6H.
We welcome back Domenic Rosati. Last time we heard about the problem of harmful finetuning, and a specification for what it might mean for a model to be resistant to it. Now Domenic's back with a new paper explaining how to actually defend against these attacks. Should be good news for anyone wanting to release a language model openly while locking down harmful capabilities.
May be somewhat technical!
We welcome a variety of backgrounds, opinions and experience levels.
Every week on Thursday
[Guest] Domenic Rosati - actually defending against harmful finetuning