Running Evaluation Metrics With Different LLMS + Q/A


Details
In this beginner-friendly 40-minute workshop, you’ll learn a simple, repeatable way to evaluate Q&A answers from different LLMs using a tiny dataset and two complementary approaches: basic automatic scores (Exact Match/F1) and an “LLM-as-Judge” rubric for Correctness, Faithfulness, Relevance, and Conciseness. We’ll show how to compare models fairly (same prompt/settings, temperature=0, consistent context), interpret results, and turn findings into actions using a light Analyze → Measure → Open Coding → Axial Coding loop. You’ll leave with a plug-and-play rubric, a mini dataset template, and a beginner notebook that generates a clear side-by-side report—so you can pick the right model with confidence and iterate quickly. + AI Residency Q/A Add the DDS Google calendar link so that you don't miss any events
AI Unlocked: Trends, Talent & Opportunities
Ready to dive into the future of AI?
Join us for a high-energy session exploring the latest breakthroughs, real-world use cases, and how YOU can ride the AI wave—whether you’re a student, pro, or just curious.
Date: July 22, Tuesday
👉 Register here: https://nas.io/artificialintelligence/events/evaluating-rag-systems-for-accuracy-trust-and-impact
✨ What’s Inside:
🔹 Hottest trends shaping AI today
🔹 How industries are using AI to win
🔹 Behind the scenes of our exclusive AI Residency Program
🔹 How to join the AI Challenge (and win big!)
🔹 Live Q&A to get your questions answered
This is your gateway to becoming part of the next-gen AI revolution.
Don’t miss out!

Every week on Tuesday until October 28, 2025
Running Evaluation Metrics With Different LLMS + Q/A