PyData Roma Capitale Meetup @ Almawave
Details
We’re back with a new event hosted at Almawave, featuring a deep dive into Large Language Models (LLMs) and the critical challenge of evaluating them in production. Standard benchmarks often fall short when assessing model performance on real-world tasks. If you are fine-tuning models, building RAG systems, or simply want to learn how to measure what actually matters beyond generic accuracy scores, this event is for you!
⚠️ IMPORTANT: Security & Logistics (Please Read Below) ⬇️
Where: Almawave - Via di Casal Boccone 188, Roma
When: 24/03/2026, 18:00
Schedule:
- 18:00 🚪 Door Opening & Security Check
- 18:15 📢 Introduction to PyData Roma Capitale & Almawave
- 18:30 🎤 Talk — ValVet: An evaluation framework for large language models (Salvatore Ricciardi, Simone Scaboro)
- 19:30 🤝 Socializing & Aperitivo
- 20:30 🔚 Closing
ValVet: An evaluation framework for large language models
As large language models become increasingly embedded in production systems, the need for rigorous, reproducible, and domain-specific evaluation grows more urgent. Standard benchmarks often fall short when assessing model performance on real-world tasks such as retrieval-augmented generation (RAG) or multilingual customer support, leaving practitioners without reliable signals to guide model selection and deployment decisions.
We introduce ValVet, an evaluation framework built in Python designed to define, compose, and run custom evaluation metrics tailored to specific use cases. The name stands for eValuation Velvet — a nod to Velvet, the family of large language models developed by Almawave. While born from the need to rigorously evaluate the Velvet models, ValVet is entirely model-agnostic and can be used to benchmark any LLM.
During this talk, we will walk through the design principles behind ValVet, showing how it leverages Python’s ecosystem — including pandas, sentence-transformers, and async processing — to scale evaluation across thousands of model outputs efficiently. We will show examples on evaluation datasets, created from real use-cases or adapted from existing benchmarks, giving you a concrete toolkit and a fresh perspective on how to evaluate your models.
⚠️ IMPORTANT: Security & Logistics
Space is limited, so RSVP today to secure your spot! Due to corporate security policies, please pay attention to the following rules:
- Registration will close on 20 March; you MUST RSVP using your full name and surname. Access is granted via a guest list and entry badges. If your full name is not provided, you risk being denied entry. (PyData organizers are not responsible for denied access due to incomplete registration).
- Arrival & Parking: Reaching the venue by public transport may not be very easy, so we recommend coming with your own vehicle (consider sharing, when possible): there is ample parking available on-site.
- Security Check: Before entering the parking area, you must stop at the security barrier, where you will need to leave a valid ID to collect your entry badge. Please remember to return your badge and collect your ID before leaving!
