AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
Details
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).
This is virtual event for our AI global community, please double-check your local time.
Can't make it live? Register anyway to receive the webinar recording.
Description:
Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.
Tech Talk: Evaluating AI Agent Reliability
Speaker: Anupam Datta (Snowflake) | Josh Reini (Snowflake)
Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through?
These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it.
This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
- 95% error detection (vs. 55% baseline methods)
- 86% accuracy in pinpointing where an error occurred (vs. 49% baseline methods)
- Human reviewers using the GPA framework caught 100% of the internal agent errors in the TRAIL/GAIA dataset.
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
More upcoming sessions:
- Jan 15th: AI Deep Dive Webinar Series (Virtual)
- Jan 7th: AI Meetup (Virtual) - Agent Systems and LLM Compression Deep Dive
Local and Global AI Community on Discord
Join us on discord for local and global AI tech community:
- Events chat: chat and connect with speakers and global and local attendees;
- Learning AI: events, learning materials, study groups;
- Startups: innovation, projects collaborations, founders/co-founders;
- Jobs and Careers: job openings, post resumes, hiring managers
