Evaluating and Improving Performance of Agentic Systems with Snorkel.AI


Details
GenAI systems are evolving beyond basic information retrieval and question answering, becoming sophisticated agents capable of managing multi-turn dialogues and executing complex, multi-step tasks autonomously.
However, reliably evaluating and systematically improving their performance remains challenging. In this session, we'll explore methods for assessing the behavior of LLM-driven agentic systems, highlighting techniques and showcasing actionable insights to identify performance bottlenecks and to create better-aligned, more reliable agentic AI systems.
To ground this talk, we will showcase a brief demo – aligning auto evaluators with SME judgment for agentic evaluation.
All attendees will be required to show their IDs at the front desk upon check-in at the AWS office located at: 1620 26th Street, Suite 3000N Santa Monica, CA 90404


Evaluating and Improving Performance of Agentic Systems with Snorkel.AI