Skip to content

SHUG6. Apache Drill (M. Hausenblas), Popular Hadoop MapReduce tools (D. Whiting)

Photo of Adam Kawa
Hosted By
Adam K.
SHUG6. Apache Drill (M. Hausenblas), Popular Hadoop MapReduce tools (D. Whiting)

Details

We are happy to invite you to the 6th meeting of Stockholm Hadoop User Group! Again, two presentations will be delivered. Please find the details bellow:

First presentation

Title: Apache Drill - interactive, analytics for large-scale datasets

Speaker: Michael Hausenblas, Chief Data Engineer EMEA, MapR Technologies

Abstract: Apache Drill is a distributed system for interactive analysis of large-scale datasets, inspired by Google’s Dremel technology. It is designed to scale to thousands of servers and able to process Petabytes of data in seconds, enabling SQL-on-Hadoop and supporting a variety of data sources. Since its inception a year ago, Apache Drill has gained widespread interest in the community, attracting hundreds of people.

We will discuss how Apache Drill enables ad-hoc interactive query at scale,review the system architecture and walk through use case. We then focus on Apache Drill's unique support for a variety of back-ends (HDFS, HBase, MySQL, MongoDB, CouchDB, etc.), its extensibility points, and show a demo of the system.

Bio:http://photos4.meetupstatic.com/photos/event/8/4/c/8/event_262113992.jpeg Michael works at MapR Technologies as Chief Data Engineer EMEA. His background is in large-scale data integration research and development, advocacy and standardisation. He has experience with NoSQL databases and the Hadoop ecosystem.

Michael speaks at events, blogs about big data, and writes articles and books on the topic. Michael contributes to Apache Drill, a distributed system for interactive analysis of large-scale datasets.

Second presentation

Title: "Scalding the Crunchy Pig for Cascading into the Hive": Evaluating the pros and cons of popular Hadoop processing tools and frameworks.

Speaker: David Whiting, Data Engineer at Spotify

Abstract: Cascading, Scalding, Cascalog, Crunch, Scrunch, Pig, Hive - there's a plethora of options when it comes to processing your data in Hadoop, and there's always somebody with a strong opinion about which one is best for each occasion. It's often hard to get a sense of how they differ from each other and how they are good or bad for your specific use case. We will be exploring the features - both good and bad - of some of the more popular ones and showing examples of jobs implemented in each. Hopefully you'll leave with a much better idea of the philosophy behind each system and how and where you can use them.

Bio: David spent 18 months in the data team at Last.fm and since Feburary has been developing data infrastructure at Spotify - making him something of an expert in working with music data sets. He mostly works with Hadoop, but can occasionally be found dabbling in data warehousing, SQL query optimisation and front-end web apps; as well as telling everybody else they're not doing enough testing and that everything is better with static typing.

As well as generating music data, he also generates music under the guise of Demoscene Time Machine ( http://music.demoscenetimemachine.com/ ), takes part in the occasional triathlon and has some very unusual dance moves.

Additional information:

Please RSVP to this meetup, since we need to put everybody on a guest list for entering the Spotify office. The event will be held in the cafeteria of the Spotify office, so don’t go to the normal entrance but to the 11th floor.

Pizza and beverages will be available for the participants during the meetup. This is another reason to RSVP to this meetup, if you are willing to come - it will help us to estimate the number of pizzas and drinks based on declared attendance.

The door will be open between 17:45 and 18:15* Because of fire regulations, we need to keep a list of everybody in the building, so please make sure that you get your name ticked off the list at the entrance or (in case of a +1), make sure that the person at the door puts your name on the list.

*Unfortunately we can not leave the door open all the time (the company security policy), nor have a person that will be constantly watching for guests coming late. If you really need to come later, please let us know in the comments bellow, so that somebody will come to the door to open it a given time.

See you at soon!

SHUG Team

Photo of Stockholm Hadoop User Group group
Stockholm Hadoop User Group
See more events
Spotify Office
Birger Jarlsgatan 61 (11tr) · Stockholm