Zum Inhalt springen

Details

This is the 4th workshop in our series to update the LLM Zoomcamp content.

​This workshop updates Module 4: Evaluation.

​In this hands-on session, Alexey Grigorev will show how to evaluate retrieval and answer quality in a RAG application.

​You’ll learn how to create ground truth data, evaluate search results, compare generated answers, and use both embedding-based metrics and LLM-as-a-Judge for offline evaluation.

What you’ll learn:

  • ​Why evaluation is important for LLM applications
  • ​What can go wrong in RAG systems without systematic evaluation
  • ​How to create ground truth data for retrieval evaluation
  • ​How to use an LLM to generate evaluation data
  • ​How to evaluate text search results
  • ​How ranking metrics work for retrieval evaluation
  • ​How to compare offline and online evaluation
  • ​How to generate data for offline RAG evaluation
  • ​How to use embeddings and cosine similarity to compare answers
  • ​How to compare answers from different models
  • ​How to use LLM-as-a-Judge for answer evaluation
  • ​How to evaluate answers with A→Q→A’ and Q→A approaches

​By the end, you’ll understand how to measure the quality of a RAG system instead of relying only on manual testing. You’ll have notebooks and datasets for evaluating both retrieval and generated answers.

Like the other workshops, this will be a live demo with practical tips and time for Q&A.

***

​All events in these series:

  1. Build Your First RAG Application with LLMs
  2. From RAG to AI Agents: Function Calling and Tool Use
  3. Vector Databases: Embeddings, Semantic Search, and Hybrid Retrieval
  4. RAG and Agents Evaluation: Measuring Retrieval and LLM Answer Quality
  5. Monitoring LLM Applications: Traces, Feedback, and Production Quality

***

## ​Thinking about Joining LLM Zoomcamp?

​This workshop covers the updated content for Module 4 of the LLM Zoomcamp, our free course on building practical LLM applications with RAG, vector search, evaluation, monitoring, and AI agents.
​You start with a simple RAG pipeline, then improve it with better retrieval, semantic search, function calling, evaluation, monitoring, and production practices.
​The course covers the full lifecycle of an LLM application: from the first working prototype to evaluation, monitoring, and a complete final project.
​The new cohort of LLM Zoomcamp starts on June 8, 2026. You can join it by registering here.

## ​About the Speaker

​Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.

​Alexey is a software and ML engineer with over 10 years in engineering and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS’17 Criteo Challenge.​

**Join our Slack: https://datatalks.club/slack.html**

Das könnte dir auch gefallen