Future of Data: San Francisco cover photo

Future of Data: San Francisco cover photo

Part of Future of Data - 21 groups

Future of Data: San Francisco

5.0•26 ratings

San Francisco, CA, US

Share

About us

This meetup is focused on the Future of Data and the open community data projects governed by the Apache Software Foundation. Geared towards developers, data scientists and ALL Data enthusiasts who are building modern data applications. Our meetups cover all data -- data-in-motion and data-at-rest. Meetups provide an opportunity to listen, share and work hands on with other technologists in the open source and open community Apache tools.

Sponsors

Cloudera

Cloudera

Cloudera Enterprise Data Hub | From The Edge To AI

Upcoming events

1

AI Lakehouse Meetup - Bay Area
Wed, Jul 15 · 5:00 PM PDT
Location not specified yet
Join us for an exciting evening of deep dives into next-generation data infrastructure, open table formats, and architectural frameworks purpose-built for production-grade AI applications and autonomous agents.
Whether you are looking to unify multimodal data silos or scale complex agentic reasoning workloads over enterprise data systems, this meetup brings together top engineering minds from Cloudera, LanceDB, Google Cloud, and PuppyGraph to share live demos, architecture breakdowns, and real-world production insights.

🗓️ Event Details:

Date: Wednesday, July 15, 2026

Time: 05:00 PM to 08:30 PM PDT

Format: In-Person with Socialive Broadcast

Location: Cloudera San Jose Office, 6220 America Center Dr, 5th Floor, San Jose, CA 95002

⏰ Agenda

5:00 PM – 5:30 PM: Register & Settle Down

5:30 PM – 6:00 PM: Talk 1: Building the Multimodal Lakehouse for AI with LanceDB

6:00 PM – 6:30 PM: Talk 2: Putting Agents in your Data Platforms - Are we Ready?

6:30 PM – 7:00 PM: Talk 3: Agent Context at Scale: Graph + SQL on Apache Iceberg

7:00 PM – 7:30 PM: Talk 4: Architecting the AI-Native, Cross-Cloud Lakehouse

7:30 PM – 8:30 PM: Networking & Snacks 🍕✨

📚 Session Breakdowns & Speakers

#### Talk 1: Building the Multimodal Lakehouse for AI with LanceDB

The next wave of AI applications demands seamless, scalable access to text, images, embeddings, and other complex modalities—but current lakehouse solutions still force teams into closed systems for vector search, full-text search, or feature engineering, reintroducing data silos. In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, the multimodal lakehouse built on top of it. Together, they provide low-latency access, unified vector, full-text, and SQL search, and flexible schema evolution across the entire multimodal AI lifecycle. From application serving to feature engineering and large-scale training, learn how innovators build open, performant, and production-grade multimodal systems at scale.

Speakers: ChanChan Mao (DevRel @ LanceDB) & Lu Qiu (Database Engineer @ LanceDB)

#### Talk 2: Putting Agents in your Data Platforms - Are we Ready? (with Apache Iceberg & Cloudera AI)

Data platforms traditionally use deterministic pipelines for predictable query patterns, but Agentic AI introduces an execution model where agents dynamically explore data systems by probing schemas, issuing iterative queries, validating hypotheses, and refining their approach based on intermediate results. This session will cover the architectural primitives required to manage these unpredictable workloads, the core building blocks for isolation, context, governance, and auditability, and how Apache Iceberg's snapshot-based storage and branching semantics support building robust agentic workflows on enterprise data platforms.

Speaker: Dipankar Mazumdar (Director-Developers @ Cloudera)

#### Talk 3: Agent Context at Scale: Graph + SQL on Apache Iceberg

Agentic systems place new demands on data infrastructure: scalability, performance, and guardrails to keep agents grounded in accurate context. At the same time, they push natural language interfaces beyond text-to-SQL, freeing retrieval to use the right tool for the right job. In this talk, we introduce a pluggable text-to-insight framework built on Apache Iceberg that runs both SQL and Cypher over the same underlying data, giving agents richer context for better reasoning without duplication or new silos. We’ll end with a live proof-of-concept demo showing it in action.

Speaker: Jaz Ku (Solution Architect @ PuppyGraph)

#### Talk 4: Architecting the AI-Native, Cross-Cloud Lakehouse

Adopting open table formats like Apache Iceberg has historically meant navigating a trade-off between true open interoperability and the operational ease of a fully managed platform. In this session, we’ll explore how to architect a borderless, cross-cloud data foundation built for the agentic era. We will dive into how Google Cloud’s Lakehouse architecture leverages the open Iceberg REST Catalog to provide a unified metadata layer across any compatible engine (BigQuery, Managed Spark, or Trino). Finally, we’ll demonstrate how to pair this open foundation with GCP's Knowledge Catalog to transform passive Iceberg metadata into an active semantic knowledge engine for AI agents.

Speaker: Vinod Ramachandran (Google)

🎟️ RSVP & Important Notes
Space is limited at the Cloudera San Jose Office
Registration through Meetup does not guarantee admission. Please register through the official Luma page (https://luma.com/n8aycq3j) for consideration and event updates.

We look forward to seeing you there! 🙌

##
2 attendees

Past events

34

Organizers

Future of Data and 3 others

Future of D. is a Super Organizer

Members

1,190

Sponsors

Cloudera

Cloudera

Cloudera Enterprise Data Hub | From The Edge To AI

hortonworks.com

Related topics

Internet of Things (IOT)

Machine Learning