Skip to content

Details

How can we avoid A.I. disasters? The plan so far is mostly to check the extent to which A.I.s could cause catastrophic harms based on tests in controlled conditions. However, there are obvious problems with this approach, both technical and due to their limited scope.

I'll give an overview of the work my team at Anthropic did to evaluate risks due to models feigning incompetence, colluding, or sabotaging human decision-making. I'll also discuss the idea of “control” techniques, which use A.I.s to monitor and set traps to look for bad behavior in other A.I.s. Finally, I'll outline the main problems beyond the scope of these approaches, in particular that of robustly aligning our institutions to human interests.

Suggested readings:

About the Speaker:

David Kristjanson Duvenaud is an associate professor in the Department of Computer Science and Statistical Sciences at the University of Toronto, where he holds a Schwartz Reisman Chair in Technology and Society. A leading voice in AI safety and artificial general intelligence (AGI) governance, Duvenaud’s current work focuses on evaluating dangerous capabilities in advanced AI systems, mitigating catastrophic risks from future models, and developing institutional designs for post-AGI futures.

Duvenaud’s early work helped shape the field of probabilistic deep learning, with contributions including neural ordinary differential equations, gradient-based hyperparameter optimization, and generative models for molecular design. He has received numerous honors, including the Sloan Research Fellowship and best paper awards at NeurIPS, ICML, and ICFP. Before joining the University of Toronto, Duvenaud was a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group and completed his PhD at the University of Cambridge, studying Bayesian nonparametrics with Carl Rasmussen and Zoubin Ghahramani.

_________________________________________________________

This is an online talk and audience Q&A presented by the University of Toronto's Schwartz Reisman Institute for Technology and Society. It is open to the public and held on Zoom.

The featured speaker will present for 45 minutes, followed by an open discussion with participants.

About the Schwartz Reisman Institute for Technology and Society:

The Schwartz Reisman Institute for Technology and Society is a research institute at the University of Toronto that explores the ethical and societal implications of technology. Our mission is to deepen our understanding of technologies, societies, and what it means to be human by integrating research across traditional boundaries and building practical, human-centred solutions that really make a difference.

We believe humanity still has the power to shape the technological revolution in positive ways, and we’re here to connect and collaborate with the brightest minds in the world to make that belief a reality. The integrative research we conduct rethinks technology’s role in society, the contemporary needs of human communities, and the systems that govern them. We’re investigating how best to align technology with human values and deploy it accordingly.

The human-centred solutions we build are actionable and practical, highlighting the potential of emerging technologies to serve the public good while protecting citizens and societies from their misuse.

The institute will be housed in the new $100 million Schwartz Reisman Innovation Centre currently under construction at the University of Toronto.

Artificial Intelligence
Ethics
Science
Political Philosophy
Technology

Members are also interested in