Big Data at Uber


Details
About the presentation
Did you know that Uber has an office in Louisville? Do you want to hear about the great things they are building? Do you want to see their office? Well, now you can. Join us for a couple of great presentations that give a glimpse into how Uber approaches big data problems.
Hope to see you there!
Agenda
6:00 – 6:30 - Socialize over food and drink
6:30 – 6:45 - Announcements, Upcoming Events
6:45 – 8:00 – Presentations
8:00 – 8:30 – Q&A
Presentations
Presentation #1:
Presenter: Sudhir Tonse
Presentation title: Big Data Pipeline and ML at Uber
Uber's Marketplace organization runs the logistic platform that makes Uber's on-demand services work. In order to run the Marketplace efficiently we need to ingest, aggregate, store and analyze data - both at Realtime and as batch processes. This talk will walk you through some of the use cases and the high level overall architecture of data pipelines, processing and Machine Learning at the core of Uber's Marketplace.
Bio:
Sudhir Tonse manages the Marketplace Data Intelligence organization at Uber which is responsible for the Data Pipeline and compute framework as well as Insights and Analytics. Previously Sudhir managed the Cloud Platform Infrastructure team at Netflix and was responsible for many of the services and components that form the MicroServices based Cloud Platform. Sudhir is a weekend golfer and tries to make the most of the wonderful California weather and public courses.
Presentation #2:
Presenter: Cody Yancey
Presentation title: Using Zeppelin to run SQL queries across Avro, Parquet, and other data sources
Zeppelin allows you to perform ad-hoc, free-form SQL queries across Avro, Parquet, Cassandra, JDBC and more from a convenient web-app, which is great for data analysis and prototyping. We've dockerized the web-app and implemented an HDFS backup system to store the code and settings, enabling us to provision personal Zeppelin servers for our developers on Aurora/Mesos from a self-service web-tool.
Bio: Cody is a software engineer on the Uber Maps Infrastructure team. He works on PaaS architecture using open source software for Uber's Big Data processing needs. He has worked on Software Defined Network systems, Cassandra, Spark on Mesos, and most recently has been advocating Zeppelin for prototyping and ad-hoc data analysis.
Presentation #3:
Presenter: Scott Short, Collections Team Lead, Uber Maps
Presentation title: Case Study: Optimal HDFS File Formats within Spark
Description:
The goal of this session is to help you navigate the myriad of file format choices for distributed storage. The Uber Map CARs team leverages three different file formats within their Spark-based solution. In this session Scott will discuss the design decisions behind each of these choices and the lessons learned.
Bio:
Scott has developed big data solutions for over a decade. At Uber, Scott leads efforts to build Spark-based solutions to improve the quality of maps. Prior to Uber, Scott was a founding member of Image Processing Framework team, a large-scale distributed storage and compute system used to generate Bing Maps. Scott also built solutions for Bing Local Search on top of Cosmos, Microsoft’s Map-Reduce implementation.
Uber is hiring!!
Uber's Maps Data Engineering Team is helping create the next generation of mapping and sensing technologies. The work we do is directly impacting Uber's mission of bringing safe, reliable transportation to everyone, everywhere. Our team is comprised of world-class engineers with decades of software development and geospatial experience. We're looking for exceptional engineers who can work faster and smarter without sacrificing technical excellence.
Please note, all attendees will need to sign a nondisclosure agreement (NDA) in order to enter the premises of Uber office.

Big Data at Uber