LLM evaluation: A live demo
32 attendees from 4 groups hosting
Details
Hi all!
Join our next CTM Online event!
Agenda
7 - 7:10 pm (UTC+2): Intro
7:10 - 8:00 pm (UTC+2): LLM evaluation: A live demo (Anupam Krishnamurthy)
LLM evaluation: A live demo
As test automation engineers, we’ve relied on a bedrock of consistency to test software. We tried our best to isolate and eliminate non-deterministic behaviour from our systems. And now we’re faced with the challenge of testing software that is non-deterministic by design.
In this session, I will demonstrate the inner workings of a Retrieval Augmented Generation (RAG) model, and how you can subject it to automated evaluation by using another LLM-as-a-judge. Once we run the evals, we will prop up the hood and examine the stack traces of the evaluation framework, so that we can debug an unexpected result. We will then subject the RAG model to an indirect injection attack.
Join me in this live demonstration, where we pit one LLM against another, and try to expose a security flaw in the bargain.
8:00 - 8:15 pm (UTC+2) - Q&A - Open Discussion
Join Zoom Meeting:
https://us06web.zoom.us/j/89442999005
Meeting ID: 894 4299 9005
Passcode: 159057
See you there! 🙂



