AI-driven test strategy and RAG Evaluation Strategies for LLM Systems


Details
Scaling QA with LLMs – Automating Test Design and Execution for Lean Product Teams
In many agile product teams, limited QA, BA, and development capacity often lead to challenges in building and maintaining robust regression suites. This session demonstrates how Large Language Models (LLMs) can be leveraged to automate QA processes—reducing manual effort while enhancing test coverage and speed.
We’ll walk through a practical, end-to-end AI-augmented QA workflow designed for fast-moving teams with lean resources:
- Jira Story Analysis – Using LLMs to extract meaningful test scenarios from user stories and acceptance criteria.
- Automated Test Case Generation – Creating structured, non-redundant test cases with a feedback loop for team input.
- Code Generation for Automation – Producing executable test scripts for both API and UI layers that align with existing automation frameworks.
Agentic QA Workflow – Leveraging autonomous AI agents to optimise and manage testing workflows.
Evaluating RAG Systems – Practical Techniques to Improve Retrieval and Answer Quality
Retrieval-Augmented Generation (RAG) systems are powerful, but their evaluation often remains ambiguous—especially when organisations seek to ensure reliable performance without overreliance on LLM judgement. In this session, we explore structured and practical evaluation approaches that blend human judgement, ground truth preparation, and both traditional and LLM-based metrics.
We’ll walk through a hands-on evaluation methodology with a focus on reproducibility, observability, and organisational alignment:
- Preparing high-quality ground truth data
- Human-in-the-loop review and iterative improvement
- Evaluation techniques for retriever and answer generation
- Comparing manual vs. LLM-based judgment
- Interpreting results and feeding them into system improvement cycles
- Observability practices for long-term monitoring
Adapting evaluations to domain-specific needs

Sponsors
AI-driven test strategy and RAG Evaluation Strategies for LLM Systems