Your Evals Are Bad: Evaluation and the Model Development Lifecycle
Details
REGISTER AT THE LUMA EVENT PAGE!!!
https://luma.com/27ja5gwl
Join us for an exciting talk by Mary Gibbs, Senior Applied Scientist at Relativity.
Agenda:
6:00 - 6:30 PM - Welcome and mingle
6:30- 6:45 PM - Introductions
6:45 - 7:30 PM - Talk
7:30 - 8:00 PM - Wrap up
Description:
If you have ever shipped a model, watched your metrics improve, and later learned from your users that something was wrong, the metrics were always wrong. You just didn’t know it yet. An evaluation consists of three components, a benchmark, a scorer, and a claim about what a score represents. Each component has its own weaknesses. Benchmarks can suffer from narrow coverage, contamination, or saturation. Scorers are often chosen for ease of automation or computation rather than for their alignment with user outcomes. And the claim connecting a score to reality is rarely made explicit. These gaps compound across the model development lifecycle. When metrics improve, teams treat that as a signal and optimize directly against it, which is how a measurement problem becomes a model problem. This talk maps where evaluations can go wrong, considers counterarguments, and ends with practical advice for building better ones.
Speaker Bio:
Mary is a Senior Applied Scientist at Relativity, tackling data science challenges in the e-discovery and legal tech space. She is also an organizer for Women and Gender eXpansive Coders DC (formerly Women Who Code DC), fostering a community dedicated to empowering women and nonbinary individuals to excel in their careers. Mary's experience spans various domains. She has developed data science solutions related to job search and career progression at Teal, cybersecurity challenges at LiveAction Software, and commercial and government consulting at Mosaic Data Science. Before venturing into the field of data science, Mary conducted and published research pertaining to the cellular and molecular mechanisms underlying neurodevelopment at the National Institutes of Health. In other words, she has dissected and imaged a lot of fruit fly brains. She holds a M.S. in Data Science from The George Washington University and a B.A. in Biological Sciences from Cornell University




