Big Data Integration, Management, Compliance: Apache Gobblin, Dali and friends


Details
In this edition of the Big Data Meetup, we focus on the Data Integration and Management ecosystems at different companies that are powered by Apache Gobblin and related technologies.
Agenda
6:00 - 6:30 pm - Check in, Food, Drinks & Networking
6:30 - 6:35 pm - Introductions
6:35 - 6:45 pm - Talk #1: Apache Gobblin: The latest news (10 mins)
6:50 - 7:10 pm - Talk #2: How We Gobble Data at Prezi (20 mins)
7:15 - 7:30 pm - Talk #3: Foundations for a Data-Driven Marketing Engine at Machine Zone (15 mins)
7:35 - 8:00 pm - Talk #4: Balancing Data Democracy with Data Privacy: The LinkedIn Story with Gobblin, Dali and WhereHows (25 mins)
8:00 - 8:30 pm - Networking and wrap up
Please RSVP so we get an accurate headcount for food + drinks
Talk Details
Talk #1: Apache Gobblin: The Latest News (Abhishek Tiwari)
Exciting new developments in the Apache Gobblin world. A sneak peek into how Gobblin has evolved from a platform to an ecosystem, and the community's progress towards adopting the Apache way.
Abhishek is the PPMC member and Committer of Apache Gobblin.
Talk #2: How We Gobble Data at Prezi (Tamas Nemeth)
If you want to do any kind of analysis you need to collect data. Even though everybody is doing it, nobody is really talking about it. We at Prezi burnt ourself a couple of times with it. Tamas will talk about how Prezi started on its journey on data ingestion, and why we chose Apache Gobblin as one of our key component. We’ll describe how we run Gobblin in AWS to move data to S3 and make it available for those who want to understand what our 85 million users need. Ohh, and of course you will hear about our challenges as we love war stories.
Tamas Nemeth is responsible for the technology stack of Prezi’s data infrastructure and he makes sure it is reliable and a joy to work with.
Talk #3: Foundations for a Data-Driven Marketing Engine at Machine Zone (Michael Dreibelbis)
Machine Zone (MZ) is reinventing how the entire world experiences data via our mobile games division MZ Games Studios, our digital marketing division Cognant, and our live data platform division Satori. Attend this talk to learn how we use Gobblin here at MZ to ingest data from many diverse sources at scale.
Michael is on the Data Platform core team at MZ that manages ETL pipelines for MZ's Cognant Marketing Engine.
Talk #4: Balancing Data Democracy with Data Privacy: LinkedIn’s Data Management Platform (Eric Ogren and Anthony Hsu)
How do you provide unfettered access to data to your data scientists, but at the same time, preserve the privacy of your members, who have entrusted you with their data.
Eric Ogren and Anthony Hsu outline the path LinkedIn has taken to protect member privacy in its scalable distributed data ecosystem built around Kafka and Hadoop. They discuss three foundational building blocks for scalable data management: a centralized metadata system (WhereHows), a standardized data movement platform (Gobblin), and a unified data access layer (Dali).
Eric and Anthony are on the Analytics Platform team at LinkedIn working on data management infrastructure.

Big Data Integration, Management, Compliance: Apache Gobblin, Dali and friends