Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Details
This is a ticketed event. Please register at this link.
Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.
Related topics
Events in Toronto, ON
AI and Society
Artificial Intelligence
Artificial Intelligence Applications
Artificial Intelligence Machine Learning Robotics
Critical Thinking
