RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
Details
This is the 4th workshop in our series to update the LLM Zoomcamp content.
This workshop updates Module 4: Evaluation.
In this hands-on session, Alexey Grigorev will show how to evaluate retrieval and answer quality in a RAG application.
You’ll learn how to create ground truth data, evaluate search results, compare generated answers, and use both embedding-based metrics and LLM-as-a-Judge for offline evaluation.
What you’ll learn:
- Why evaluation is important for LLM applications
- What can go wrong in RAG systems without systematic evaluation
- How to create ground truth data for retrieval evaluation
- How to use an LLM to generate evaluation data
- How to evaluate text search results
- How ranking metrics work for retrieval evaluation
- How to compare offline and online evaluation
- How to generate data for offline RAG evaluation
- How to use embeddings and cosine similarity to compare answers
- How to compare answers from different models
- How to use LLM-as-a-Judge for answer evaluation
- How to evaluate answers with A→Q→A’ and Q→A approaches
By the end, you’ll understand how to measure the quality of a RAG system instead of relying only on manual testing. You’ll have notebooks and datasets for evaluating both retrieval and generated answers.
Like the other workshops, this will be a live demo with practical tips and time for Q&A.
***
All events in these series:
- Build Your First RAG Application with LLMs
- From RAG to AI Agents: Function Calling and Tool Use
- Vector Databases: Embeddings, Semantic Search, and Hybrid Retrieval
- RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
- Monitoring LLM Applications: Traces, Feedback, and Production Quality
***
## Thinking about Joining LLM Zoomcamp?
This workshop covers the updated content for Module 4 of the LLM Zoomcamp, our free course on building practical LLM applications with RAG, vector search, evaluation, monitoring, and AI agents.
You start with a simple RAG pipeline, then improve it with better retrieval, semantic search, function calling, evaluation, monitoring, and production practices.
The course covers the full lifecycle of an LLM application: from the first working prototype to evaluation, monitoring, and a complete final project.
The new cohort of LLM Zoomcamp starts on June 8, 2026. You can join it by registering here.
## About the Speaker
Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.
Alexey is a software and ML engineer with over 10 years in engineering and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS’17 Criteo Challenge.
**Join our Slack: https://datatalks.club/slack.html**
