Skip to content

Intelligent Catalog Meetup at Uber

Photo of Uber Engineering
Hosted By
Uber E.
Intelligent Catalog Meetup at Uber

Details

We welcome you to join us for an in-person Meetup hosted by Uber discussing Catalog usages in the industry. We will kick off with opening marks by Shanshan Song, Uber’s senior director for Storage, Search, and Data, followed by a panel discussion and a series of four interesting talks by the speakers from Uber, OpenAI, Databricks, and Datastrato.

Event Details

  • This Meetup is a co-hosted by Uber
  • This Meetup is an in-person only event
  • Registration is required for the Meetup. Please RSVP & answer the questions (full name & email address)
  • Event is located in Sunnyvale (not located at event pin), and location details will be emailed a few days before the event to those who have registered for the event and provided an email address

Event Agenda

  • 5:30 ~ 5:40 PM Welcome and Opening: Shanshan Song, Senior Director of Storage, Search, and Data @ Uber
  • 5:40 ~ 6:20 PM Panel Discussion
  • Panel Speaker 1: Junping Du, CEO of Datastrato
  • Panel Speaker 2: Jing Zhao, Principal Engineer of Data Platform @ Uber
  • Panel Speaker 3: Chao Sun, Member of Technical Staff @ OpenAI
  • Panel Speaker 4: Jason Reid, Data Engineering Advocate @ Databricks
  • 6:20 ~ 6:40 PM Happy Hour and Networking
  • 6:40 ~ 7:00 PM Talk 1: Beinan Wang - Uber
  • 7:00 ~ 7:20 PM Talk 2: Jason Reid - Databricks
  • 7:20 ~ 7:40 PM Talk 3: Chao Sun & Cheng Su - OpenAI
  • 7:40 ~ 8:00 PM Talk 4: Jerry Shao - Datastrato

Presentation Info

  • Powering Region-Agnostic Machine Learning with a Universal Data Catalog
    The increasing demand for GPU resources in modern machine learning workloads is often challenged by their scarcity and the distributed nature of data and storage. Specifically, GPUs are frequently located remotely from the data they need to process, which may reside in on-premises HDFS or cloud-based storage like GCS. This talk addresses this challenge by introducing a solution that leverages a Catalog to provide a region-agnostic interface. This approach abstracts the underlying data location, enabling training pipelines to seamlessly access data from either on-premises HDFS or GCS in the Cloud. By decoupling compute from storage location, this solution enhances GPU utilization and streamlines the training process in distributed environments.
  • Toward a Unified Data Catalog at OpenAI
    In this talk, we’ll share how OpenAI is building a unified data catalog as part of its evolving data infrastructure. We’ll cover the technical journey of abstracting over Databricks Unity Catalog to enable seamless transition between Databricks Spark and open-source Spark. Along the way, we’ll dive into our caching layer for Delta Lake metadata and the unified access control system we developed. We’ll also discuss our long term vision: moving towards a self-hosted, open-source catalog backed by open table formats, enabling flexibility, performance, and vendor neutrality at scale
  • Advancing Toward an Intelligent and Agentic Data Architecture with Apache Gravitino
    While data stacks have advanced over the past 20 years to handle big data’s "3Vs" (Volume, Velocity, Variety), their query-centric design remains a bottleneck for the generative AI era. We propose a shift to metadata-centric architectures—exemplified by ​Apache Gravitino—to unlock interpretability, intelligence, and adaptability. This presentation demonstrates how Gravitino’s metadata-driven approach enables agentic data systems, with real-world examples of collaborative data agents solving governance and management challenges. We also explore future integrations with LLMs to automate and evolve data ecosystems into truly intelligent platforms.
  • Building an interoperable Lakehouse with Unity Catalog
    What connects your lakehouse to real data intelligence? The answer: the catalog. Today's data driven organizations have high demands for their lakehouse catalogs; everything from support for multiple table formats to advanced governance and compliance requirements to end to end lineage between their AI models and the source data that powers them. Unity Catalog is purpose-built for the lakehouse and goes beyond operational or business catalogs to deliver cross-platform interoperability and a shared understanding of the entire data estate.
Photo of Uber Engineering Events - San Francisco group
Uber Engineering Events - San Francisco
See more events
FREE