Skip to content

Do We Even Need Agentic Evals? - Live AMA with Hamza Tahir (CTO, ZenML)

Photo of Rod Rivera
Hosted By
Rod R.
Do We Even Need Agentic Evals? - Live AMA with Hamza Tahir (CTO, ZenML)

Details

Date: Thursday, September 11
Time: 4:00 PM IST (Ireland) 4:00 PM UK (BST) / 11:00 AM ET 8:00 AM PT
Format: Live AMA on the Jentic Community Discord (questions asked directly in Discord; answered live by Hamza & Rod)

## What this session is about

The industry is split: some teams ship fast with “vibes,” others build rigorous evaluation stacks. In this AMA, we’ll cut through the noise and get practical about when evals matter, when monitoring and A/Bs are enough, and how to pick the right level of rigor for agents in production.

## Topics we’ll cover

  • Clear definitions: research evals vs. engineering evals; offline vs. online; LLM-as-judge vs. human review
  • When to use what: smoke tests, regression suites, monitoring, and A/B tests for agents
  • Agentic specifics: long-running loops, tool use, stuckness, “silent” failures, and goal correctness
  • Error analysis that ships: turning traces into actionable evals without boiling the ocean
  • Goodharting & drift: avoiding metric gaming; keeping evals aligned to product KPIs
  • Coding agents vs. enterprise ops: why HITL domains tolerate lighter evals—and where you can’t cut corners
  • Starter kit: a minimal, sensible stack (dataset, judge, rubric, monitor, dashboard) you can adopt tomorrow

## Why attend

  • You’re building agent workflows and need a measured, production-ready approach to quality
  • You want to iterate faster without flying blind
  • You’re deciding between eval tools, building your own, or relying on monitoring + A/B in prod

## Speakers

  • Hamza Tahir - CTO, ZenML
  • Rod Rivera - Host, Jentic Community

## How the AMA works

  • Join the Jentic Community Discord (link in the registration confirmation).
  • Drop your questions in the live AMA channel. We’ll answer them in real time.
  • Can’t attend live? Register to get the recording.

## Prep (optional)

Provide one concrete scenario: your agent, the target outcome, and the current failure mode. We’ll map it to a minimal evaluation and monitoring plan during Q&A.
Cost: Free
Recording: Yes (shared with registrants)
Register now and bring your toughest “Do we even need evals?” question.

Photo of Agentic Workflows & AI Agents Meetup group
Agentic Workflows & AI Agents Meetup
See more events