Intro to AI Evals (non-technical) | Virtual
Details
A free, public workshop—no technical background needed
AI systems can now write code, imitate human conversation, strategize, and even deceive. But despite how often we use these tools, most people still have no idea how they actually work—or how researchers test whether they’re safe.
This workshop pulls back the curtain on modern AI systems and explores the emerging field of AI evaluations (“evals”): the methods researchers use to measure what these models are capable of, where they fail, and how they might become dangerous.
Together, we’ll tackle questions like:
How are large language models actually created—and how are they different from traditional software?
Reasoning: What is “chain-of-thought” reasoning, and why are today’s models starting to behave in surprising ways?
Evals: Why don’t normal software tests work for AI systems—and what replaces them?
Capabilities: How do researchers test for deception, manipulation, persuasion, and autonomous behavior?
AGI & ASI: Why do many experts believe evals could become one of the most important challenges in the development of advanced AI?
What you’ll get:
- A beginner-friendly explanation of how modern AI systems function under the hood
- Hands-on demos where we jailbreak different AI models live and compare how they respond to attacks
- A walkthrough of real AI safety evaluations used by frontier labs
- A live recreation of Anthropic’s famous AI blackmail experiment—where an AI agent threatened to expose a CEO’s private emails to avoid being shut down
Come ready to go beyond the hype and see modern AI systems up close—not just what they can do, but how researchers are struggling to measure, predict, and control increasingly powerful behavior.
