Skip to content

Evolution of AI Safety mitigations

Photo of Emile Delcourt
Hosted By
Emile D.
Evolution of AI Safety mitigations

Details

Join us at AI Safety Awareness Group Boston for an insightful session to catch up on the state of AI mitigations in a world-café format!

Whether you're new to combining traditional security practices with machine learning (like applying threat modeling to AI systems), or looking for like-minded experts, this event is for you. Connect with others interested in governance, interpretability, reinforcement learning, and other approaches to managing AI risks.

You'll be able to pick and choose among 3 discussion pods centered on levels of AI safety:

Established Mitigations for Immediate Risks

  • What vulnerabilities are common in large language models, and what are the minimum practices to avoid low-cost abuse?
  • Explore widely-adopted solutions and understand their current limitations and best practices
  • Many vendors (e.g. in OWASP's solution landscape) offer mitigations against severe risks like the OWASP's Top 10 for LLMs 2025 risks in large language models, for instance mitigating prompt injection, jailbreaks, and data poisoning. Do they work? What are must-haves?
  • Reminders in NIST 800-218A Secure Software Development Framework (SSDF) reiterate many critical security practices throughout the AI development lifecycle: it's still essential to know where a model was built, implement robust access controls, threat modeling and continuous monitoring, output sanitization, and only to expose what is needed (e.g. structured responses, function calling).

Emerging Solutions & Advanced Threats

  • Given risks like model inversion and jailbreaks, should we expose privileged access or confidential data in training or to agents that process untrusted information?
  • Dive into cutting-edge approaches that show promising results to stop exposures and motivated attacks, and what challenges they face in practice.
  • Discussion participants might explore practical takes on the effectiveness of scalable oversight methods for model development, the feasibility and appropriate contexts to roll out evals or mitigations from the latest OWASP report for agentic/multi-agentic systems, or takeaways from iterative prompt evaluation or constitutional classifiers.

Research Frontiers & Existential Risks

  • If AGI is widely available within the next 2 years, could any paradigms or architectures provably favor positive safety outcomes?
  • This track will discuss novel theoretical findings, frameworks, and experimental techniques with a high potential to make managing existential risks tractable

Ahead of the event, you may benefit most from this session by
a) reflecting 30min on the kind of world you want to live in in 5-6 years
b) checking out our recommended readings/courses/resources regarding AI risks and mitigations.

Photo of AI Safety Awareness Group Boston group
AI Safety Awareness Group Boston
See more events
Panorama Education
24 School St. (4th floor), near Park St. station · Boston, ma