Skip to content

Synthetic Data Assessment and Best Practices for Data Department Efficacy

Photo of Emanuele Roppo
Hosted By
Emanuele R. and 4 others
Synthetic Data Assessment and Best Practices for Data Department Efficacy

Details

đź“… Date: 22-01-2025
📍 Location: Talentware @ WAO PL7, Via Luigi Porro Lambertenghi 7 (Isola/Zara/Garibaldi FS)

Agenda

  • 18:30 - Doors Open
  • 18:50 - Welcome to PyData
  • 19:00 - Talk 1: SURE: A New Privacy and Utility Assessment Library for Synthetic Data
    Speaker: Luca Gilli, Co-founder and CTO @ Clearbox AI; Dario Brunelli, ML Engineer @ Clearbox AI
  • 19:40 - Talk 2: Going beyond traditional software playbook to build an effective data department
    Speaker: Raffaele Bongo, Senior ML Engineer & Data Solutions Architect @ Talentware
  • 20:20 - Networking & Aperitivo offered by Talentware

Join us for dinner after the event too to continue chatting!

Talks and Speakers

SURE: A New Privacy and Utility Assessment Library for Synthetic Data

In this talk, we will explore the core principles of validating and evaluating synthetic data across diverse domains, highlighting the importance of scalable and robust solutions. We will introduce SURE, an open-source library designed to assess privacy risks and utility trade-offs in synthetic datasets. By showcasing how SURE leverages the power of Polars to efficiently generate detailed and insightful reports, we’ll demonstrate its role in streamlining synthetic data assessment. Finally, we will discuss the current open challenges in its development and future directions.

Dario Brunelli ventured into the world of data with a Master’s in Quantum Machine Learning, leveraging his M.Sc. in Electrical Engineering and a dynamic background in motorsport. As a Machine Learning Engineer at Clearbox AI, he specializes in synthetic data generation, with a focus on data privacy and time-series generation. When he's not coding, Dario enjoys photography, cycling, and surfing.

Luca Gilli is the co-founder of Clearbox AI, a company specializing in synthetic data generation solutions to drive innovation and enhance data privacy. He holds a PhD in computational mathematics from Delft University of Technology. He worked as a scientific software developer for a consultancy firm for over 5 years in the Netherlands. Outside of work, he enjoys hiking in the beautiful Piedmontese mountains. He’s also passionate about farming, with a special interest in experimenting with hydroponic techniques.

Going beyond traditional software playbook to build an effective data department

The journey to AI maturity is full of challenges. In Italy, while 65-70% of companies have embraced digitalization and 60% utilize cloud computing, only 9-11% have achieved mature AI solutions. This disparity underscores the evolving nature of AI as a discipline and its distinction from traditional software development. With over 80% of AI projects failing due to unclear expectations, inadequate frameworks, and communication gaps, the need for a structured approach has never been more urgent. This talk introduces the foundational pillars of Talentware’s Data Department that is:

  1. Stakeholder Evangelization: Ensuring that all stakeholders grasp the experimental and iterative nature of AI projects and the massive importance of data quality.
  2. Specific Operational Framework: Emphasizing clear business requirements, shared success metrics, and the delivery of incremental, high-value outputs.
  3. Clear and Frequent Communication: Establishing a shared glossary, providing regular updates, and using structured frameworks to convey the complexity and value of AI tasks effectively.

By addressing these pillars, the session highlights how AI projects differ from traditional software, and that their success heavily depends on all professionals involved at every level understanding this distinction.

Raffaele Bongo is a Senior Machine Learning Engineer and Data Solutions Architect at Talentware. Raffaele has an extensive experience in startups and scale-ups, where he has specialized in building and deploying machine learning models and data pipelines on cloud platforms. Originally, Raffaele signed up for mechanical engineering but switched to computer science last minute because AI sounded cooler. Great call!

Photo of PyData Milano group
PyData Milano
See more events
WAO PL7 - Spazio Coworking
Via Luigi Porro Lambertenghi 7 · Milan