Trust Issues: Evaluating GenAI in the Wild
Details
✨ Disclaimer
We welcome everyone to our events. That being said, we always recommend any male attendees to bring a +1 female friend to join our events! So do share it with your female colleagues / friends!
🔹 Synopsis:
With the increasing adoption of GenAI systems in production-ready applications, one question looms large: how do we know they’re reliable? This talk explores the critical need for rigorously evaluating GenAI applications and dives into practical ways to do it right. From covering common evaluation methodologies such as LLM-as-a-judge, to more sophisticated techniques like red teaming and guardrailing, the session will equip attendees with clear strategies for making GenAI systems safer, smarter, and more trustworthy.
To ground these ideas, the session will showcase hands-on examples of stress-testing models at scale using readily available frameworks. Expect a fast-paced, practical session that leaves participants with actionable techniques, toolkits, and a roadmap for evaluating their own GenAI systems with confidence.
✨ About Div(0):
Div0 is an open, inclusive, and volunteer-driven cybersecurity community group. Div0 provides a platform where cybersecurity professionals, practitioners, and enthusiasts can meet like-minded people, explore and learn with peers, and contribute to the community. Div0 does so by organising events, driving programmes and initiatives, encouraging collaborations and contributions, and reaching out to the public.