Past Meetup

Hadoop 2.0: What's coming?

This Meetup is past

122 people went

Location image of event venue

Details

Our Speaker:

Sanjay Radia - Bio

Sanjay is founder and architect at Hortonworks. Sanjay is an Apache Hadoop committer and member of the Apache Hadoop PMC.

Prior to co-founding Hortonworks, Sanjay was the chief architect of core-Hadoop at Yahoo and part of the team that created Hadoop. In Hadoop he has focused mostly on HDFS, MapReduce schedulers, high availability, compatibility, etc. He has also held senior engineering positions at Sun Microsystems and INRIA, where he developed software for distributed systems and grid/utility computing infrastructures. Sanjay has a PhD in Computer Science from the University of Waterloo in Canada.

Our Agenda:

Talk 1 Hadoop 2 - What is New (60mins)

The upcoming major release, Hadoop 2.0 offers several significant HDFS and Yarn/MapReduce improvements. The HDFS improvements including new append-pipeline, federation, wire compatibility, NameNode HA, Snapshots, and performance improvements. We describe how to take advantages of these new features and their benefits. We cover some architectural improvements in detail such as HA, Federation and Snapshots. Apache Hadoop Yarn is the new basis for running MapReduce and other applications on a Hadoop cluster. It recasts Hadoop as a more generic data processing system. We describe the architecture of Yarn and the benefits it offers.

- Questions (10mins)

Talk 2: Stinger (45mins)

Apache Hadoop and its ecosystem projects Hive and Pig support interactions with data sets of enormous sizes. Petabyte scale data warehouse infrastructures are built on top of Hadoop for providing access to data of massive and small sizes. Hadoop always excelled at large-scale data processing; however, running smaller queries has been problematic due to the batch-oriented nature of the system. The enhancements we have made to the resource management system (YARN), to the hadoop execution environment (called Tez) and to Hive, elevates the Hadoop ecosystem and in particular query processing to be much more powerful, performant and user-friendly. This talk will cover the improvements we have made to YARN, MapReduce, Hive.

- Questions (10mins)