Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Details
Uber’s mission is to ignite opportunities by setting the world in motion. To fulfill this mission, Uber relies heavily on making data-driven decisions in every product area and we need to store and process an ever-increasing amount of data, in addition to providing faster, more reliable, and more-performant access.
This talk will reflect on the challenges faced with scaling Uber’s Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing our hardware. We will provide a behind-the-scenes look at the current data technology landscape at Uber, including various open-source technologies (e.g. Hadoop, Spark, Hive, Presto, Kafka, Avro) as well as open-sourced in-house-built solutions such as Hudi, Marmaray, etc. We'll dive into the technical aspects of how our ingestion platform was re-architected to bring in 10+ trillion events/day, with 100+ TB new data/day, at minute-level latency, how our storage platform was scaled to reliably store 100+ PB of data in the data lake, and our processing platform was designed to efficiently serve millions of queries and jobs/day while processing 1+ PB per day.
You’ll leave the talk with greater insight into how data truly powers each and every Uber experience and will be inspired to re-envision your own data platform to be more extensible and scalable.
Agenda:
6:20 pm - 6:30 pm Arrival and socializing
6:30 pm - 6:40 pm Opening
6:40 pm - 7:50 pm Reza Shiftehfar, "Uber’s Big Data Platform: 100+ Petabytes with Minute Latency"
7:50 pm - 8:00 pm Q&A
About Reza Shiftehfar:
Reza Shiftehfar currently leads Uber’s Hadoop Platform team. His team helps build and grow Uber’s reliable and scalable Big Data platform that serves petabytes of data utilizing technologies such as Apache Hadoop, Apache Hive, Apache Kafka, Apache Spark, and Presto. Reza is one of the founding engineers of Uber’s data team and helped scale Uber's data platform from a few terabytes to over 100 petabytes while reducing data latency from 24+ hours to minutes. Reza holds a Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign.

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency