Data Ingest at Scale - Lessons from PlanetLabs and Uber

Details
At this meetup we'll hear about a 'far out' data ingest usecase; How far out? Try 400 km up in the sky!
The elevation giving you vertigo? Not to worry there will be stories of Petabytes of data!
This event is sponsored by StreamSets (http://streamsets.com)
https://upload.wikimedia.org/wikipedia/commons/3/32/Planet_Labs_satellite_launch_from_ISS.jpg
6pm - 6.30 pm - Pizza/Beer and Networking.
6.30pm - 7.15 pm - Alex Newman - Data Janitor at PlanetLabs (https://www.planet.com/)
Planet is on a mission. To Image the world everyday. In addition our customers
want to be able to detect when anything in the world changes in an instant. To achieve this, planet operates more spacecraft than any other operator, operates ground stations around the world and can cram hundreds of spacecraft on a single rocket. These spacecrafts create petabytes of imagery data that we analyze, rectify and can detect changes in real time. We have developed extremely reliable and bleeding edge software to manage our mounds of data while still relying on some tried and more mature technologies to keep our satellites flying. This talk will provide an overview of our infrastructure.
Since we have to scale to billions of images and hundreds of kilobytes of metadata for each image planet has built a unique storage system that provides all of the benefits of relational database management systems, provides postgis capability while scaling far beyond what traditional databases can scale to into the cloud. In addition we have special requirements around compliance for export control and we also are built on cloud hardware which does not scale as much as dedicated hardware. Technologies in this storage system include
-
S3
-
PostgresQL
-
Aurora
-
Kinesis
-
Elastic Search
-
Streamsets
We will be releasing some of the code we use to glue these systems together. In addition we will provide guidance on how users can use this release code right away in their systems.
7.15pm - 8pm - Jae Hyeon Bae Software Engineer - Uber
Jae will talk about how Uber marketplace dynamics team is using ElasticSearch as the main storage and query engine. Especially it will be focused on why ElasticSearch was chosen, how we implemented reliable data ingestion on stream processing engine, and various lessons learned on taming ElasticSearch with large scale data processing.
Speaker Bios :
Alex Newman has contributed to much of the entire hadoop ecosystem. As one of the early members of the Cloudera engineering staff, he has extensive experience in all things big data. He has built and sold database software companies and now works at PlanetLabs helping manage the largest spacecraft network in the world.
Jae Hyeon Bae is a software engineer in Uber. He's currently in charge of overall data engineering in Uber marketplace dynamics. Prior to joining Uber, he worked on building Netflix main data pipeline. He revamped the first generation of Netflix data pipeline under the name of Suro and initiated the baseline of realtime analytics with Kafka, ElasticSearch, and Druid.

Sponsors
Data Ingest at Scale - Lessons from PlanetLabs and Uber