Skip to content

Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

N
Hosted By
Nabeeha
Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

Details

AI agent evaluation is evolving — it’s no longer just about what the AI agent outputs, but how it got there. In this Part 3 webinar of the community series with Arize AI, we will dive into advanced AI agent evaluation techniques, including path-based reasoning, convergence analysis, and even using agents to evaluate other agents.

Explore how to measure the efficiency and structure of agent reasoning paths, assess collaboration in multi-agent systems, and evaluate the quality of planning in complex setups like hierarchical or crew-based frameworks. You will also get a look at emerging techniques like self-evaluation, peer review, and agent-as-judge models — where agents critique and improve each other in real time.

What We Will Cover:

  • Understand how to evaluate not just what an AI agent does, but how it arrived at its output.
  • Measure convergence and reasoning paths to assess execution quality and efficiency.
  • Learn how to evaluate collaboration and role effectiveness in multi-agent systems.
  • Explore methods for assessing planning quality in hierarchical and crew-based agents.
  • Dive into agents-as-judges: Enable self-evaluation and peer review mechanisms and build critique tools and internal feedback loops to improve agent performance.
  • Discuss real-world applications of these techniques in large-scale, agentic AI systems.
  • Interactive Element: Watch a live example of an agent acting as a judge — or participate in a multi-agent AI agent evaluation demo using Arize Phoenix.

Missed the earlier parts? Catch up on Part 1 and Part 2 of the series!

Photo of Data Science Dojo- Phoenix group
Data Science Dojo- Phoenix
See more events

Every week on Wednesday until May 28, 2025

Online event
Link visible for attendees
FREE