Skip to content

How to run benchmarks for LLMs at scale

Photo of Doron Chen
Hosted By
Doron C.
How to run benchmarks for LLMs at scale

Details

Abstract
With the rapid growth of large language models (LLMs), benchmarking has become a critical but often misunderstood step in model evaluation and deployment. Traditional leaderboards offer limited insight into how models perform under real-world constraints. In this talk, we’ll explore how to design and run LLM benchmarks at scale: across multiple models, hardware, and load configurations. We'll cover how to build reproducible, scalable benchmarking pipelines that surface meaningful trade-offs between latency, throughput, cost, and accuracy. As part of this session, we’ll also dive into Red Hat’s newly launched Third-Party Validated Models program and share how we conducted large-scale benchmarking to support it.

About the Speaker
Roy is an AI and HPC expert with over a decade of experience in building advanced AI systems. He recently joined Red Hat through the acquisition of Jounce, where he served as the CEO. Roy is a Talpiot alumnus, holding a PhD in computer science and a GMBA.

Photo of Tech Talks group
Tech Talks
See more events
Online event
Link visible for attendees
FREE