Name: Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Start: 2026-04-02T18:00:00-04:00
End: 2026-04-02T21:00:00-04:00
Location: 30 Adelaide East, Industrious Office 12th Floor Common Area

This is a ticketed event. Please register at [this link](https://luma.com/sdhf0qqc).

Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.

​​​**Event Schedule**
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

​​​​​​If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via [this link](https://www.youtube.com/@Trajectory-Labs/live).

Georgia Berg

Mario Gibney

Toronto AI Safety

Technology

Risk Management

New Technology

Safety

Critical Thinking

Artificial Intelligence Applications

AI and Society

Mathematics

Artificial Intelligence Machine Learning Robotics

Artificial Intelligence

Machine Learning

Software Engineering

Machine Learning Interpretability

Deep Learning

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

30 Adelaide East, Industrious Office 12th Floor Common Area

Share

Toronto AI Safety

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

Toronto AI Safety

Details

Related topics

You may also like