Name: How to run benchmarks for LLMs at scale
Start: 2025-06-09T12:00:00+03:00
End: 2025-06-09T13:00:00+03:00

**Abstract**
With the rapid growth of large language models (LLMs), benchmarking has become a critical but often misunderstood step in model evaluation and deployment. Traditional leaderboards offer limited insight into how models perform under real-world constraints. In this talk, we’ll explore how to design and run LLM benchmarks at scale: across multiple models, hardware, and load configurations. We'll cover how to build reproducible, scalable benchmarking pipelines that surface meaningful trade-offs between latency, throughput, cost, and accuracy. As part of this session, we’ll also dive into Red Hat’s newly launched Third-Party Validated Models program and share how we conducted large-scale benchmarking to support it.

**About the Speaker**
Roy is an AI and HPC expert with over a decade of experience in building advanced AI systems. He recently joined Red Hat through the acquisition of Jounce, where he served as the CEO. Roy is a Talpiot alumnus, holding a PhD in computer science and a GMBA.

Doron Chen

Tech Talks

Technology

Java

Software Development

JavaScript

Web Technology

Web Development

Big Data

Virtualization

Computer Programming

Software Architecture

Java Virtual Machine

NoSQL

Israel HighTech

DevOps

Technology Startups

Server Side Javascript

How to run benchmarks for LLMs at scale

Artificial Intelligence

Machine Learning

Online event

Share this event

How to run benchmarks for LLMs at scale

Details