Skip to content

Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

N
Hosted By
Nabeeha
Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

Details

AI agent evaluation is evolving — it’s no longer just about what the AI agent outputs, but how it got there. In this Part 3 webinar of the community series with Arize AI, we will dive into advanced AI agent evaluation techniques, including path-based reasoning, convergence analysis, and even using agents to evaluate other agents.

Explore how to measure the efficiency and structure of agent reasoning paths, assess collaboration in multi-agent systems, and evaluate the quality of planning in complex setups like hierarchical or crew-based frameworks. You will also get a look at emerging techniques like self-evaluation, peer review, and agent-as-judge models — where agents critique and improve each other in real time.

What We Will Cover:

  • Understand how to evaluate not just what an AI agent does, but how it arrived at its output.
  • Measure convergence and reasoning paths to assess execution quality and efficiency.
  • Learn how to evaluate collaboration and role effectiveness in multi-agent systems.
  • Explore methods for assessing planning quality in hierarchical and crew-based agents.
  • Dive into agents-as-judges: Enable self-evaluation and peer review mechanisms and build critique tools and internal feedback loops to improve agent performance.
  • Discuss real-world applications of these techniques in large-scale, agentic AI systems.
  • Interactive Element: Watch a live example of an agent acting as a judge — or participate in a multi-agent AI agent evaluation demo using Arize Phoenix.

Missed the earlier parts? Catch up on Part 1 and Part 2 of the series!

Photo of Data Science Dojo Karachi group
Data Science Dojo Karachi
See more events

Every week on Wednesday until May 28, 2025

Online event
Link visible for attendees
FREE