## Open Lakehouse & AI Data Infrastructure Meetup – NYC
📅 Tuesday, February 17
⏰ 6:00 PM – 8:30 PM EST
📍 Registration accepted only Through Luma Link
https://luma.com/gmc5k8gr
📌 New York, New York
We’re bringing together the Apache Iceberg, Lance, and Apache DataFusion communities in NYC for an evening of deep technical discussions around open lakehouse architectures and modern data infrastructure—hosted at Cloudflare’s NYC office.
This meetup is a great opportunity to learn from industry experts, connect with fellow data engineers and AI practitioners, and explore how open technologies are shaping the future of analytics and AI.
***
## 🤝 Hosted By
Cloudera | LanceDB | Cloudflare
***
## 🎤 Agenda
### 🕕 6:00 – 6:30 PM
Registration & Networking
***
### 🗣️ Talk 1: Apache Iceberg – Spec Evolution (v1 to v4) and How Cloudera’s Data Platform Supports It
Speaker: Dipankar Mazumdar, Director – Developers, Cloudera
This session explores the evolution of Apache Iceberg, the challenges each specification aimed to solve, and what’s coming next. After a brief overview of v1 and v2, we’ll deep dive into v3 and upcoming v4+ work, including:
- Lineage
- Deletion vectors
- Metadata redesign
- File format APIs
- Why these changes matter for large-scale lakehouse pipelines
The talk will also cover how Cloudera’s data platform has supported Iceberg’s core capabilities from its early days.
***
### 🗣️ Talk 2: Multimodal AI Lakehouse with Lance & LanceDB
Speaker: Chang She, Co-Founder & CEO, LanceDB
Modern AI applications demand seamless access to text, images, embeddings, and other complex data types, but existing lakehouse solutions often force teams into closed systems—re-introducing silos.
In this talk, you’ll learn about:
- Lance, a next-generation columnar data format optimized for AI
- LanceDB, a multimodal lakehouse built on top of Lance
- Unified vector, full-text, and SQL search
- Flexible schema evolution across the multimodal AI lifecycle
See how companies like Midjourney, WorldLabs, and Runway are building open, scalable, production-grade AI systems.
***
### 🗣️ Talk 3: Cloudflare’s Data Platform with Apache Iceberg & DataFusion
Speaker: Jonathan Chen, Software Engineer, Cloudflare
An introduction to Cloudflare’s new data platform, built on Apache Iceberg and Apache DataFusion, including:
- R2 Data Catalog
- R2 SQL
- Pipelines
We’ll walk through the architecture and show how Cloudflare enables SQL analytics directly on object storage, allowing users to query continuously ingested data without managing separate compute or storage systems.
***
### 🤝 8:00 – 8:30 PM
Networking & Conversations
***
## ⚠️ Important Note on Registration
Please ensure that the name used for registration exactly matches the full name on your government-issued ID. This is required for building security access.
***
🎯 Who Should Attend?
- Data engineers & architects
- AI/ML practitioners
- Open-source contributors
- Developers building lakehouse or analytics platforms
Looking forward to an evening of learning, sharing, and networking with the NYC data community!