Intelligent Catalog Meetup at Uber

Name: Intelligent Catalog Meetup at Uber
Start: 2025-06-03T17:30:00-07:00
End: 2025-06-03T20:00:00-07:00
Location: Sunnyvale

Hosted By

Uber E.

Details

We welcome you to join us for an in-person Meetup hosted by Uber discussing Catalog usages in the industry. We will kick off with opening marks by Shanshan Song, Uber’s senior director for Storage, Search, and Data, followed by a panel discussion and a series of four interesting talks by the speakers from Uber, OpenAI, Databricks, and Datastrato.

Event Details

This Meetup is a co-hosted by Uber
This Meetup is an in-person only event
Registration is required for the Meetup. Please RSVP & answer the questions (full name & email address)
Event is located in Sunnyvale (not located at event pin), and location details will be emailed a few days before the event to those who have registered for the event and provided an email address

Event Agenda

5:30 ~ 5:40 PM Welcome and Opening: Shanshan Song, Senior Director of Storage, Search, and Data @ Uber
5:40 ~ 6:20 PM Panel Discussion
Panel Speaker 1: Junping Du, CEO of Datastrato
Panel Speaker 2: Jing Zhao, Principal Engineer of Data Platform @ Uber
Panel Speaker 3: Chao Sun, Member of Technical Staff @ OpenAI
Panel Speaker 4: Jason Reid, Data Engineering Advocate @ Databricks
6:20 ~ 6:40 PM Happy Hour and Networking
6:40 ~ 7:00 PM Talk 1: Beinan Wang - Uber
7:00 ~ 7:20 PM Talk 2: Jason Reid - Databricks
7:20 ~ 7:40 PM Talk 3: Chao Sun & Cheng Su - OpenAI
7:40 ~ 8:00 PM Talk 4: Jerry Shao - Datastrato

Presentation Info

Powering Region-Agnostic Machine Learning with a Universal Data Catalog
The increasing demand for GPU resources in modern machine learning workloads is often challenged by their scarcity and the distributed nature of data and storage. Specifically, GPUs are frequently located remotely from the data they need to process, which may reside in on-premises HDFS or cloud-based storage like GCS. This talk addresses this challenge by introducing a solution that leverages a Catalog to provide a region-agnostic interface. This approach abstracts the underlying data location, enabling training pipelines to seamlessly access data from either on-premises HDFS or GCS in the Cloud. By decoupling compute from storage location, this solution enhances GPU utilization and streamlines the training process in distributed environments.
Toward a Unified Data Catalog at OpenAI
In this talk, we’ll share how OpenAI is building a unified data catalog as part of its evolving data infrastructure. We’ll cover the technical journey of abstracting over Databricks Unity Catalog to enable seamless transition between Databricks Spark and open-source Spark. Along the way, we’ll dive into our caching layer for Delta Lake metadata and the unified access control system we developed. We’ll also discuss our long term vision: moving towards a self-hosted, open-source catalog backed by open table formats, enabling flexibility, performance, and vendor neutrality at scale
Advancing Toward an Intelligent and Agentic Data Architecture with Apache Gravitino
While data stacks have advanced over the past 20 years to handle big data’s "3Vs" (Volume, Velocity, Variety), their query-centric design remains a bottleneck for the generative AI era. We propose a shift to metadata-centric architectures—exemplified by Apache Gravitino—to unlock interpretability, intelligence, and adaptability. This presentation demonstrates how Gravitino’s metadata-driven approach enables agentic data systems, with real-world examples of collaborative data agents solving governance and management challenges. We also explore future integrations with LLMs to automate and evolve data ecosystems into truly intelligent platforms.
Building an interoperable Lakehouse with Unity Catalog
What connects your lakehouse to real data intelligence? The answer: the catalog. Today's data driven organizations have high demands for their lakehouse catalogs; everything from support for multiple table formats to advanced governance and compliance requirements to end to end lineage between their AI models and the source data that powers them. Unity Catalog is purpose-built for the lakehouse and goes beyond operational or business catalogs to deliver cross-platform interoperability and a shared understanding of the entire data estate.

Events in Sunnyvale, CA

Master Data Management Metadata