Synthetic Data Assessment and Best Practices for Data Department Efficacy


Details
đź“… Date: 22-01-2025
📍 Location: Talentware @ WAO PL7, Via Luigi Porro Lambertenghi 7 (Isola/Zara/Garibaldi FS)
Agenda
- 18:30 - Doors Open
- 18:50 - Welcome to PyData
- 19:00 - Talk 1: SURE: A New Privacy and Utility Assessment Library for Synthetic Data
Speaker: Luca Gilli, Co-founder and CTO @ Clearbox AI; Dario Brunelli, ML Engineer @ Clearbox AI - 19:40 - Talk 2: Going beyond traditional software playbook to build an effective data department
Speaker: Raffaele Bongo, Senior ML Engineer & Data Solutions Architect @ Talentware - 20:20 - Networking & Aperitivo offered by Talentware
Join us for dinner after the event too to continue chatting!
Talks and Speakers
SURE: A New Privacy and Utility Assessment Library for Synthetic Data
In this talk, we will explore the core principles of validating and evaluating synthetic data across diverse domains, highlighting the importance of scalable and robust solutions. We will introduce SURE, an open-source library designed to assess privacy risks and utility trade-offs in synthetic datasets. By showcasing how SURE leverages the power of Polars to efficiently generate detailed and insightful reports, we’ll demonstrate its role in streamlining synthetic data assessment. Finally, we will discuss the current open challenges in its development and future directions.
Dario Brunelli ventured into the world of data with a Master’s in Quantum Machine Learning, leveraging his M.Sc. in Electrical Engineering and a dynamic background in motorsport. As a Machine Learning Engineer at Clearbox AI, he specializes in synthetic data generation, with a focus on data privacy and time-series generation. When he's not coding, Dario enjoys photography, cycling, and surfing.
Luca Gilli is the co-founder of Clearbox AI, a company specializing in synthetic data generation solutions to drive innovation and enhance data privacy. He holds a PhD in computational mathematics from Delft University of Technology. He worked as a scientific software developer for a consultancy firm for over 5 years in the Netherlands. Outside of work, he enjoys hiking in the beautiful Piedmontese mountains. He’s also passionate about farming, with a special interest in experimenting with hydroponic techniques.
Going beyond traditional software playbook to build an effective data department
The journey to AI maturity is full of challenges. In Italy, while 65-70% of companies have embraced digitalization and 60% utilize cloud computing, only 9-11% have achieved mature AI solutions. This disparity underscores the evolving nature of AI as a discipline and its distinction from traditional software development. With over 80% of AI projects failing due to unclear expectations, inadequate frameworks, and communication gaps, the need for a structured approach has never been more urgent. This talk introduces the foundational pillars of Talentware’s Data Department that is:
- Stakeholder Evangelization: Ensuring that all stakeholders grasp the experimental and iterative nature of AI projects and the massive importance of data quality.
- Specific Operational Framework: Emphasizing clear business requirements, shared success metrics, and the delivery of incremental, high-value outputs.
- Clear and Frequent Communication: Establishing a shared glossary, providing regular updates, and using structured frameworks to convey the complexity and value of AI tasks effectively.
By addressing these pillars, the session highlights how AI projects differ from traditional software, and that their success heavily depends on all professionals involved at every level understanding this distinction.
Raffaele Bongo is a Senior Machine Learning Engineer and Data Solutions Architect at Talentware. Raffaele has an extensive experience in startups and scale-ups, where he has specialized in building and deploying machine learning models and data pipelines on cloud platforms. Originally, Raffaele signed up for mechanical engineering but switched to computer science last minute because AI sounded cooler. Great call!

Synthetic Data Assessment and Best Practices for Data Department Efficacy