When building and implementing AI systems it’s important that we ensure the system's values are aligned to human values. Those humans could be its creators, operators, or better yet society at large.
While this may sound like a distant challenge, destined for sci-fi movies and comic books, the same challenges are prevalent in present-day AI systems and applied research can uncover more robust methods for controlling and predicting the outputs of increasingly advanced AI agents, and ensuring those outputs align to human values.
Rohin Shah (tentative)
Rohin is a 6th year PhD student in Computer Science working at the Center for Human-Compatible AI (CHAI) at UC Berkeley. His general interests in CS are very broad, including AI, machine learning, programming languages, complexity theory, algorithms, and security, and so he started his PhD working on program synthesis. However, he was convinced that it is really important for us to build safe, aligned AI, and so moved to CHAI at the start of his 4th year. He now thinks about how to provide specifications of good behaviour in ways other than reward functions, especially ones that do not require much human effort. He is best known for the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment that has over 1600 subscribers.
Subscribe to Rohin's newsletter here >> http://rohinshah.com/alignment-newsletter/
Lewis is a 1st year DPhil student in computer science at the University of Oxford and a DPhil Affiliate at the Future of Humanity Institute (FHI). He is primarily interested in how the combination of two major paradigms within artificial intelligence, symbolic and statistical, can be combined in order to make systems safer, more interpretable, and provably beneficial for society as a whole. His current research falls at the intersection of game theory, machine learning, and formal verification.
If you want to know more you can find him online at >> www.lewishammond.com
Presentation: Modelling Agent Incentives
Abstract: How can we best reason about the many ways a powerful AI agent might try to observe or influence its environment?
Several researchers at DeepMind and FHI (amongst other places) have attempted to formalise the notion of agent incentives using particular kinds of probabilistic graphical model known as causal influence diagrams (CIDs) and, more recently, structural causal influence models (SCIMs). I will give a brief, non-technical introduction to this theory and indicate how it can be used to capture and elucidate multiple problems in aligning powerful AI, a few of which can already be observed in today's systems.
Presentation: Bayesian Reinforcement Learning
Abstract: It's hard to know how advanced AI will behave, since we don't have any programs for advanced AI that we can run. But there are algorithms for advanced AI that are too slow to run; we know they'd be intelligent given things we can prove about them. These algorithms are within the scope of Bayesian reinforcement learning. I work on designing Bayesian reinforcement learners that we could expect to be safe. If we could not come up with a design for an AI that is safe and does optimal reasoning, we could hardly be confident that we could create an AI that is safe and does merely superhuman reasoning.
There might even be a little introduction from yours truly (Ben), but I’ll do my best to leave the majority of the speaking to the people that actually know what they’re talking about.