The Evaluation Game: What Counts as Good AI
Details
Evaluation has long been the scoreboard of AI progress. It decides who’s ahead, what counts as “a good model”, and which systems are seen as breakthroughs. But as models race forward to break records one moment while failing at basic math the next, you start to wonder: are we still playing by the right rules? This talk by Ruchira Dhar from the University of Copenhagen explores how the “rules of play” in AI evaluation—our choices of metrics, datasets, and reporting—shape what we think models can do and what we expect of them. Evaluation isn’t just a step in the pipeline; it's a strategic game we play that has very real consequences for public trust, governance, and safety.
Everyone warmly welcome!
The event is hosted at Station by Effective Altruism Denmark. After the talk there will be opportunity for discussion (both on and off topic) and snacks will be provided.
