Skip to content

19. A Philosophy of Building Data Pipelines

Photo of Rafał Wojdyła
Hosted By
Rafał W.
19. A Philosophy of Building Data Pipelines

Details

Agenda:

• 17.45: drink, socialize

• 18.00: first talk: A Philosophy of Building Data Pipelines

Speaker: Josh Wills Director of Data Science at Cloudera and creator of Apache Crunch.

http://photos3.meetupstatic.com/photos/event/b/b/a/0/600_421428032.jpeg

Before Cloudera Josh worked as Software Engineer and Statistician - at Google, IBM and a couple of startups. He earned his Bachelor's degree in Mathematics from Duke University and his Master's in Operations Research from The University of Texas - Austin.

Abstract: If you ask a data scientist what gift they would like for their birthday, a lot of them would ask for one really good data engineer, which I define as a software engineer who specializes in large-scale data transformation and integration. I like to think of data engineering as a new discipline within computer science that has emerged as a result of the rise of large-scale data processing technologies based on Apache Hadoop. In this talk, I will outline my thoughts about this new discipline, including how I think about building robust, scalable pipelines on top of Hadoop using tools like Apache Crunch and Apache Spark, as well as the most effective ways for data scientists and data engineers to collaborate to solve problems.

• 18.45: eat, drink, socialize (more)

• 19.00: second talk: Scrub - Spotify CRUnch Batch processing

Speaker: Mārtiņš Kalvāns Software Developer at Spotify

At Spotify Mārtiņš is a key member of DataEx - team responsible for developer experience - specifically engineers working with data - main product of DataEx is Scrub - comprehensive framework for batch data pipelines at Spotify. Mārtiņš earned his Master's in Computer Science from University of Latvia.

Abstract: At Spotify we have decided to move away from Hadoop Streaming to more mature and stable batch processing technology. After throughout evaluation of different tools, we have chosen Apache Crunch as official and supported framework. In this talk I will uncover reasons, statistics behind the decision but also real problems we have met and solved on the way to better data pipelines at Spotify.

• 19.45: drink, socialize (even more) & have a chance to win free O'Reilly books

Food and drinks will be provided by Spotify and O'Reilly Media (thanks)

Follow SHUG on twitter (https://twitter.com/shug_meetup)!

Strata + Hadoop World 2015 discount:

To receive 20% discount to Strata + Hadoop World 2015 use code SHUG20, more info about the conference here (http://oreil.ly/UK15SHW).

https://lh6.googleusercontent.com/3MW5SlMfg4FdneFdmwRyJHTngSxE-cZX62dLciQR0_QAblH2tn0UH7m7l-BTE25vzbMtiMqb4TNKp0ec0gOYu-jQeUgcLxZyHaNxuI_CjFqgcpYgXcQyHBo8kvGKqIeRA__JEmg

UPDATE: New and improved, with 100% more LiveStream!

Link: https://livestream.com/spotify/events/4024842

Password: hs821jd92kd

Photo of Stockholm Hadoop User Group group
Stockholm Hadoop User Group
See more events
Spotify Office
Birger Jarlsgatan 61 (11tr) · Stockholm