Name: LLM evaluation: A live demo
Start: 2026-04-15T19:00:00+02:00
End: 2026-04-15T21:00:00+02:00

Hi all!

Join our next CTM Online event!

**Agenda**
7 - 7:10 pm (UTC+2): Intro
7:10 - 8:00 pm (UTC+2): LLM evaluation: A live demo ([Anupam Krishnamurthy](https://www.linkedin.com/in/anupam-krishnamurthy/))

**LLM evaluation: A live demo**

As test automation engineers, we’ve relied on a bedrock of consistency to test software. We tried our best to isolate and eliminate [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm) behaviour from our systems. And now we’re faced with the challenge of testing software that is non-deterministic by design.
In this session, I will demonstrate the inner workings of a [Retrieval Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) model, and how you can subject it to automated evaluation by using another [LLM-as-a-judge](https://arxiv.org/abs/2306.05685). Once we run the evals, we will prop up the hood and examine the stack traces of the evaluation framework, so that we can debug an unexpected result. We will then subject the RAG model to an [indirect injection attack](https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Join me in this live demonstration, where we pit one LLM against another, and try to expose a security flaw in the bargain.

8:00 - 8:15 pm (UTC+2) - Q&A - Open Discussion

**Join Zoom Meeting:**
https://us06web.zoom.us/j/89442999005
Meeting ID: 894 4299 9005
Passcode: 159057

See you there! 🙂

Aurélien

Aksana Matulskaya

Anupam Krishnamurthy

CTM Continuous Testing Meetup Berlin

Continuous Testing Meetup

Sauce Labs is the leader in continuous quality.

Sauce Labs

TUI is committed to supporting the Continuous Testing Meetup community.

Trending is committed to supporting the CTM community.

Trendig

Technology

New Technology

Mobile Technology

Mobile Development

Test Driven Development

Agile Testing

Software QA and Testing

Data Science

iOS Development

Test Automation