Hadoop 2.0: What's coming?



Our Speaker:

Sanjay Radia - Bio

Sanjay is founder and architect at Hortonworks. Sanjay is an Apache Hadoop committer and member of the Apache Hadoop PMC.

Prior to co-founding Hortonworks, Sanjay was the chief architect of core-Hadoop at Yahoo and part of the team that created Hadoop. In Hadoop he has focused mostly on HDFS, MapReduce schedulers, high availability,  compatibility, etc. He has also held senior engineering positions at Sun Microsystems and INRIA, where he developed software for distributed systems and grid/utility computing infrastructures. Sanjay has a PhD in Computer Science from the University of Waterloo in Canada.


Our Agenda:

Talk 1 Hadoop 2 - What is New (60mins)

The upcoming major release, Hadoop 2.0 offers several significant HDFS  and Yarn/MapReduce improvements. The HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, Snapshots, and performance improvements. We describe how to take advantages of these new features and their benefits.  We cover some architectural improvements in detail such as HA, Federation and Snapshots. Apache Hadoop Yarn is the new basis for running MapReduce and other applications on a Hadoop cluster. It recasts Hadoop as a more generic data processing system. We describe the architecture of Yarn and the benefits it offers.

- Questions (10mins)

Talk 2: Stinger (45mins)

Apache Hadoop and its ecosystem projects Hive and Pig support interactions with data sets of enormous sizes. Petabyte scale data warehouse infrastructures are built on top of Hadoop for providing access to data of massive and small sizes. Hadoop always excelled at large-scale data processing; however, running smaller queries has been problematic due to the batch-oriented nature of the system. The enhancements we have made to the resource management system (YARN), to the hadoop execution environment (called Tez) and to Hive, elevates the Hadoop ecosystem  and in particular query processing to be much more powerful, performant and user-friendly. This talk will cover the improvements we have made to YARN, MapReduce, Hive.

- Questions (10mins)



Join or login to comment.

  • John M.

    Can we get these slides after?

    April 18, 2013

    • Faisal A.

      Hi Adam, do you have the links to all of the slides?

      April 26, 2013

    • Adam M.

      I could not get the same deck due to some NDA issues. However, it is all based on Hadoop Summit material so please check the general THUG discussion group for my last post with the links.

      April 26, 2013

  • David T.

    When you have a chance, I'm looking forward to seeing Sanjay's presentation decks.

    As I recall, he described the new storage architecture as a form of index, where the unique data values are stored with pointers to the row and column in which they occur (as opposed to actually storing the rows).

    I'm wondering whether there are query functionality is being envisioned that allows the index to be searched in a fairly direct manner for highly correlated values? For example, if in rows with column food = "ice cream", that I could quickly see that there's a column called "flavour" that frequently equals "chocolate" within the same rows, without first needing to know the other column name nor the other frequently occurring value (...and then walk along the tree branch)?

    At the end of the meeting, you had also asked for help on future meetups. Are there rough dates/locations/roles to help people commit?

    2 · April 24, 2013

    • Adam M.

      Sanjay was referring to ORCFile: http://www.slideshare...­ In May I will be doing a Hive 0.11 using/hacking session. And yes, I will be posting new meetups soon. The next is Mahout and we now have 3 separate presenters/demos to go through.

      1 · April 24, 2013

  • Marius B.

    A good overview of the current state of Hadoop and what to expect in the future, even though the way it was presented was bit too opinionated for my taste.

    April 21, 2013

  • Sunil R.

    very informative..

    April 19, 2013

  • Venkat M.

    Thanks to Sanjay and Adam. Excellent presentation with lots of information and lots of things to look at in Hadoop 2.0.

    April 19, 2013

  • Tri N.

    Pr Radia's presentation was eloquent. I am thrilled waiting for Hadoop 2.0

    April 19, 2013

  • Raghu S.

    Sanjay was very inspiring and excellent speech. Hope we get to hear him again. Lot of information and lots to ponder about. Thank you Adam for this excellent meetup

    April 19, 2013

  • Pankaj T.

    Thanks to Adam for getting Sanjay here and to Edward for providing space for such a wonderful experience. Kudos to Sanjay for providing great insight. Thanks guys.

    April 18, 2013

  • John M.

    Great event!

    April 18, 2013

  • Edwin C.

    Bang on the door and security will let you in and we'll escort you up.

    1 · April 18, 2013

  • Amarinder Singh( A.

    looking forward to this event.....

    April 17, 2013

  • Sunil R.

    Great! Looking forward to it.

    April 11, 2013

  • Adam M.

    I have updated our Speaker information and Agenda

    1 · April 10, 2013

  • Ron M.


    April 2, 2013

  • Edwin C.

    We can host at LoyaltyOne as well, we can accommodate over 100 people if needed in our large meeting space. We're at 438 University Ave (St Patrick Station)

    1 · April 2, 2013

  • Jordan C.

    I might be able to arrange space at Kobo. How much are you looking for?

    April 1, 2013

    • Adam M.

      A lot. :) I expect this to hit about 50-70 based on past attendance. BNotions has stepped up but we still need to confirm. Is the Kobo office near King/Spadina?

      April 2, 2013

    • Jordan C.

      Kobo is in Liberty Village (King and Dufferin). Email me if I can help (jc at kobo dot com)

      April 2, 2013

  • Tri N.

    Hi Adam,

    T4G can also offer our Lunch/Meeting room like last time. Location Queen / Broadview ave. Can fit 50 persons comfortably.

    April 2, 2013

  • jon r.

    this looks excellent!
    Looking forward to it! -JR

    April 1, 2013

  • David L.

    Exploring Hadoop and interested in learning more about it.

    1 · April 1, 2013

  • Faisal F.

    I will be there...

    April 1, 2013

Our Sponsors

  • IBM

    Meeting facilities, expert speakers, free product, books and education.

  • Big Data University

    Free on-line courses in Hadoop and big data related technologies.

  • Cloudera

    10% off training for Toronto Hadoop User Group members.

  • Hortonworks

    Food, speakers, beverages

  • T4G

    Hosting Meeting locations and providing relevant speakers

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy