Skip to content

Tackling Data Challenges at Netflix and Twitter

Photo of Pete Soderling
Hosted By
Pete S.
Tackling Data Challenges at Netflix and Twitter

Details

Netflix is a company that is transforming the way the world watches television. They are leading the video streaming business and as they grow globally, and innovating ways to process data at scale. Netflix will be joined by Twitter to host this data engineering Meetup, focused on how our companies solve the challenge of transforming and logging data at scale and in real time.

Note: this is our first meetup in the South Bay, so please spread the word to your friends that might live there, too!

Agenda:

6:00 - 7:00 Registration, Happy Hour & Networking
7:00 - 8:30 Presentations
8:30 - 9:00 Q&A, Networking, Wrap-up

Talk 1: You Are What You Log - Attacking Big Data at the Source

Presenter: Jason Reid, Netflix

One area of modern data pipelines that has lacked attention is the logger itself. However, by providing a richer logging api, we can enable source applications to do much of the heavy lifting for us, eliminating the need for expensive downstream joining and aggregation.

Talk 2: TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies

Presenter: Anirudh Todi, Twitter

Twitter’s 300+ million users generate tens of billions of tweet views per day. Aggregating these events in real time – in a robust enough way to incorporate into our products – presents a massive scaling challenge. In this talk I’ll introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. I’ll discuss how we built TSAR using Python and Scala from the ground up, almost entirely on open-source technologies (Storm, Summingbird, Kafka, Aurora, and others), and describe some of the challenges we faced in scaling it to process tens of billions of events per day.

Talk 3: Big Data Processing @Scale

Presenter: Dan Weeks, Netflix

Deep dive into the internals of the Netflix Big Data Platform and discover what it takes to process petabytes of data everyday in the cloud. Learn about how we deploy Spark and Presto for everything from ETL to ML and leverage Parquet for advanced storage and processing.

Additional info located here (https://netflixdataengmeetup.splashthat.com/).

(Note: this is a joint event in association with our friends over at the SF Data Science (https://www.meetup.com/SF-Data-Science) meetup. We encourage you to check out their group, too!)

Photo of Data Council SF Data Engineering & Science group
Data Council SF Data Engineering & Science
See more events
Netflix
121 Albright Way · Los Gatos, CA