Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

Details
AI agent evaluation is evolving — it’s no longer just about what the AI agent outputs, but how it got there. In this Part 3 webinar of the community series with Arize AI, we will dive into advanced AI agent evaluation techniques, including path-based reasoning, convergence analysis, and even using agents to evaluate other agents.
Explore how to measure the efficiency and structure of agent reasoning paths, assess collaboration in multi-agent systems, and evaluate the quality of planning in complex setups like hierarchical or crew-based frameworks. You will also get a look at emerging techniques like self-evaluation, peer review, and agent-as-judge models — where agents critique and improve each other in real time.
What We Will Cover:
- Understand how to evaluate not just what an AI agent does, but how it arrived at its output.
- Measure convergence and reasoning paths to assess execution quality and efficiency.
- Learn how to evaluate collaboration and role effectiveness in multi-agent systems.
- Explore methods for assessing planning quality in hierarchical and crew-based agents.
- Dive into agents-as-judges: Enable self-evaluation and peer review mechanisms and build critique tools and internal feedback loops to improve agent performance.
- Discuss real-world applications of these techniques in large-scale, agentic AI systems.
- Interactive Element: Watch a live example of an agent acting as a judge — or participate in a multi-agent AI agent evaluation demo using Arize Phoenix.
Missed the earlier parts? Catch up on Part 1 and Part 2 of the series!

Every week on Wednesday until May 28, 2025
Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI