[Tech Talk] Judging the judge: Fixing conversational agent evaluation

Name: [Tech Talk] Judging the judge: Fixing conversational agent evaluation
Start: 2026-03-18T18:45:00+08:00
End: 2026-03-18T20:15:00+08:00
Location: Thoughtworks Pte. Ltd.

Hosted by Virna C.

Tech Talks by Thoughtworks

Details

Judging the judge: Fixing conversational agent evaluation
with Akshay Anand, Tech Principal, Thoughtworks

Why do so many LLM and GenAI pilots stall before they ever hit production? The secret often lies in the evaluation process.

In this talk, we will break down how to design evaluation frameworks that turn experimental systems into reliable, production-ready tools. We’ll explore why standard, off-the-shelf evaluation tools often fall short and how you can combine them with the right strategy to ship trustworthy outcomes.

What you’ll learn

What to evaluate: Task quality, grounding/retrieval, safety, robustness, and cost/latency.
How to evaluate: Datasets + scenarios, rubrics/judges, metrics, and human review where it matters.
Tooling Strategy: When existing tools work and where you need customization.
Operationalization: How to operationalize evaluations for production readiness

About the speaker
Akshay Anand is a Tech Principal at Thoughtworks, where he leads high-value enterprise Data & AI platforms that deliver nationwide impact. He has architected various AI and GenAI solutions from strategy to production rollout and turning complex data into product ready capabilities. He is deeply passionate about trustworthy AI by design — systems that are secure, transparent, and responsible.

Agenda
6:45 pm Registration and networking
7:00 pm Judging the Judge: Fixing Conversational Agent Evaluation by Akshay Anand
7:40 pm Q&A and networking

Event Details
📅 Date: Wednesday, March 18, 2026
🕡 Time: 6:45 - 8:15pm
📍Venue: 18 Cross Street, #12-01 18 Cross, Singapore 048423
🍕 Food and beverages will be provided

To join us, simply click "Attend" here on Meetup.

We hope to see you there!

Tech Talks by Thoughtworks

[Tech Talk] Judging the judge: Fixing conversational agent evaluation

Tech Talks by Thoughtworks

Details

Related topics

You may also like