[Discussion] AI Alignment: can we control the intelligence we create?
詳細
This event is a moderated discussion on the topic of AI alignment.
---
A few words about the topic:
Artificial intelligence systems are becoming increasingly capable. Models that once generated simple text can now write code, plan tasks, browse the internet for information, and form judgments based on what they find.
This raises an important question: how do we make sure these systems actually do what we want them to do? This challenge is known as the AI alignment problem.
At first this might sound straightforward: just give the AI good instructions. AI systems, however, don’t understand human values the way we do; instead, they optimize the objective we give them. And if that objective is imperfectly specified, the system may find solutions that technically satisfy the goal while completely violating the intention.
Researchers already observe versions of this today: AI systems that exploit loopholes in reward functions, produce misleading outputs, or behave differently depending on how they are evaluated.
As AI systems become more autonomous, the question becomes harder: can we reliably control increasingly intelligent systems whose reasoning we only partially understand?
So what does it mean for an AI to be “aligned” with human values? Can human values even be defined clearly enough to encode in machines? Are current AI risks already serious, or mostly hypothetical? And if alignment is difficult, should society slow down AI development?
Those are a few of the questions we'll be exploring. No background in philosophy or AI is required. Only curiosity, intellectual honesty, and a willingness to examine your own assumptions.
Looking forward to thinking with you!
