Title: "Scalding the Crunchy Pig for Cascading into the Hive": Evaluating the pros and cons of popular Hadoop processing tools and frameworks.
Speaker: David Whiting, Data Engineer at Spotify
Abstract: Cascading, Scalding, Cascalog, Crunch, Scrunch, Pig, Hive - there's a plethora of options when it comes to processing your data in Hadoop, and there's always somebody with a strong opinion about which one is best for each occasion. It's often hard to get a sense of how they differ from each other and how they are good or bad for your specific use case. We will be exploring the features - both good and bad - of some of the more popular ones and showing examples of jobs implemented in each. Hopefully you'll leave with a much better idea of the philosophy behind each system and how and where you can use them.
Bio: David spent 18 months in the data team at Last.fm and since Feburary has been developing data infrastructure at Spotify - making him something of an expert in working with music data sets. He mostly works with Hadoop, but can occasionally be found dabbling in data warehousing, SQL query optimisation and front-end web apps; as well as telling everybody else they're not doing enough testing and that everything is better with static typing.
As well as generating music data, he also generates music under the guise of Demoscene Time Machine (http://music.demoscenetimemachine.com/ ), takes part in the occasional triathlon and has some very unusual dance moves.
• 7:00 - 7:20: Pizza & Networking
• 7:20 - 7:30: Livestream starts, intro
• 7:30 - 8:30: Dave's Talk
• 8:30 - 9:00: Q&A/Networking/Beers
We will be livestreaming this event. URL: http://www.livestream.com/spotifyevents (The password prompt will be disabled around 7:20)
Pizza and beverages will be available for the participants during the meetup.