Skip to content

Running Evaluation Metrics With Different LLMS + Q/A

Photo of Mohammad Arshad
Hosted By
Mohammad A. and Patricia M.
Running Evaluation Metrics With Different LLMS + Q/A

Details

In this beginner-friendly 40-minute workshop, you’ll learn a simple, repeatable way to evaluate Q&A answers from different LLMs using a tiny dataset and two complementary approaches: basic automatic scores (Exact Match/F1) and an “LLM-as-Judge” rubric for Correctness, Faithfulness, Relevance, and Conciseness. We’ll show how to compare models fairly (same prompt/settings, temperature=0, consistent context), interpret results, and turn findings into actions using a light Analyze → Measure → Open Coding → Axial Coding loop. You’ll leave with a plug-and-play rubric, a mini dataset template, and a beginner notebook that generates a clear side-by-side report—so you can pick the right model with confidence and iterate quickly. + AI Residency Q/A Add the DDS Google calendar link so that you don't miss any events

AI Unlocked: Trends, Talent & Opportunities

Ready to dive into the future of AI?
Join us for a high-energy session exploring the latest breakthroughs, real-world use cases, and how YOU can ride the AI wave—whether you’re a student, pro, or just curious.

Date: July 22, Tuesday
👉 Register here: https://nas.io/artificialintelligence/events/evaluating-rag-systems-for-accuracy-trust-and-impact

✨ What’s Inside:
🔹 Hottest trends shaping AI today
🔹 How industries are using AI to win
🔹 Behind the scenes of our exclusive AI Residency Program
🔹 How to join the AI Challenge (and win big!)
🔹 Live Q&A to get your questions answered

This is your gateway to becoming part of the next-gen AI revolution.

Don’t miss out!

Photo of MENA AI and Data Community group
MENA AI and Data Community
See more events

Every week on Tuesday until October 28, 2025

Online event
This event has passed