Measuring AI Quality: Real world case study for Atlassian’s RovoDev Coding Agent
Détails
How can you tell if you are making your AI outputs better overall, or just different?
You tweak the prompt or change the overall architecture. The output looks better in some cases but worse in others.
As AI products grow in complexity, knowing whether we are truly improving quality, rather than just shifting the errors around, is one of the hardest challenges in the space. In this talk, the speaker will share a framework for measuring AI product quality, from offline LLM evaluation through to user behaviour and business outcomes, using a real-world case study based on his time working on RovoDev, Atlassian’s AI coding agent.
Whether you are building, shipping, or evaluating AI products, you will leave with a practical guide to understanding which improvements are worth making, and knowing when to pivot.
About speaker: Wesley has 12+ years of experience in Data Science and AI. He began his career at Quantium, delivering data and ML solutions for major retailers including Walmart and Woolworths, before moving to Atlassian where he oversaw Data Science for Bitbucket and the DevOps department.
He recently founded EmeraldRock Solutions, a Data and AI consulting and coaching business.
