Git for Data: How Table Formats Unify Software and Data Development
Details
## Details
Join PyData NYC at 125 W 25th St (Cockroach Labs) on October 15th at 6:00 pm for a talk night with Jacopo Tagliabue and Ciro Greco from Bauplan. Please sign up with your full official name and bring a government-issued ID.
🍕 Pizza and drinks sponsored by Bauplan and venue hosted by Cockroach Labs - thank you!
Agenda:
Git for Data:
Distributed version control systems - such as Git - unlock software development in multi-player mode: devs can safely work over the same code base, with standard (albeit perhaps not user-friendly!) abstractions for snapshotting, time-travel, and branching. Data folks have rarely been so lucky, as their projects crucially depend on data, whose life-cycle management is often cumbersome and custom. In this talk, we present open formats - such as Apache Iceberg - to practitioners with limited to zero exposure to modern cloud infrastructure. In particular, we show how moving from datasets to tables unlocks a similar multi-player mode when building data pipelines, with equivalent abstractions for snapshotting, time-travel, branching, and a unified backbone for pipelines, data science, and AI use cases.
## Speaker
Jacopo Tagliabue/Ciro Greco
Jacopo Tagliabue is the co-founder and CTO of Bauplan. Educated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo was co-founder and CTO of Tooso, an AI startup acquired by TSX: CVO in 2019. He led Coveo's AI from scale-up to IPO, and built out Coveo Labs, a prolific R&D practice whose libraries, models, and datasets have garnered tens of millions of downloads. When not busy building products, he teaches MLSys at NYU and explores topics at the intersection of data, infrastructure, and AI. In previous lives, he managed to get a Ph.D., do sciency things for a pro basketball team, and simulate a pre-Columbian civilization.
Ciro Greco is co-founder and CEO at Bauplan, a serverless computing platform for complex data workloads. Formerly, he was the founder of Tooso, an NLP startup based in San Francisco. Tooso was acquired by Coveo in 2019, and Ciro was in the management team that brought Coveo to IPO in 2021. In a previous life, he got a PhD in Neuroscience at Milan-Bicocca, a postdoctoral fellowship at Ghent University, and he was a visiting scientist at MIT.
⚠️ Registration: Please RSVP here if you would like to attend: https://www.meetup.com/pydatanyc/events/310919622
All PyData NYC events are governed by the NumFOCUS Code of Conduct.