Skip to content

36th Bay Area Hadoop User Group (HUG) Monthly Meetup

Photo of Yahoo! HUG Organizer
Hosted By
Yahoo! HUG O.
36th Bay Area Hadoop User Group (HUG) Monthly Meetup

Details

Agenda

6:00 - 6:30 - Socialize over food and beer(s)
6:30 - 7:00 - HBase as a Service at Yahoo!
7:00 - 7:30 - The Stinger Initiative: Making Apache Hive 100 Times Faster
7:30 - 8:00 - Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing

Session I (6:30 – 7:00 PM): HBase as a Service at Yahoo!

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. Yahoo! has been using HBase for a long time as isolated one off deployments. Having a multi-tenant platform makes it possible for all our grid customers to take advantage of HBase capabilities now. We will provide a brief overview of HBase and how it works (several of you asked for back to basics type talks), and then spend the majority of our time talking about multi-tenancy with HBase.

Presenter(s):

Francis Christopher Liu, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Vandana Ayyalasomayajula, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Session II (7:00 – 7:30 PM): The Stinger Initiative: Making Apache Hive 100 Times Faster

Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop. Apache Hive was originally built for large-scale operational batch processing and it is very effective with reporting, data mining, and data preparation use cases. These usage patterns remain very important but with widespread adoption of Hadoop, the enterprise requirement for Hadoop to become more real time or interactive has increased in importance as well. Enabling Hive to answer human-time use cases (i.e. queries in the 5-30 second range) such as big data exploration, visualization, and parameterized reports without needing to resort to yet another tool to install, maintain and learn can deliver a lot of value to the large community of users with existing Hive skills and investments. To this end, we have launched the Stinger Initiative, with input and participation from the broader community, to enhance Hive with more SQL and better performance for these human-time use cases. We believe the performance changes we are making today, along with the work being done in Tez will transform Hive into a single tool that Hadoop users can use to do report generation, ad hoc queries, and large batch jobs spanning 10s or 100s of terabytes.

Presenter(s):

Alan Gates, Co-founder, Hortonworks and Apache Pig PMC and Apache HCatalog PPMC Member, Author of "Programming Pig" from O'Reilly Media

Owen O’ Malley, Co-founder, Hortonworks, First committer added to Apache Hadoop and Founding Chair of the Apache Hadoop PMC

Session III (7:30 – 8:00 PM): Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing

At Yahoo!, Hadoop plays a central role in providing personalized experiences for our users and creating value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. Through a collection of use cases, we will explain how Yahoo! delivers personalized user experience through Hadoop and Storm. We have developed Storm-on-YARN to enable Storm streaming/micro-batch applications and Hadoop batch applications hosted on a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. Yahoo! has recently released our Storm enhancement as open source.

Presenter(s):

Andy Feng, Distinguished Architect, Cloud Engineering Group, Yahoo!

Bobby Evans, Tech Yahoo!, Apache Hadoop PMC and Committer

Yahoo Campus Map:

Detail map (http://photos4.meetupstatic.com/photos/event/2/8/e/d/600_21370477.jpeg)

Location on Wikimapia:

http://www.wikimapia.org/#lat=37.4181633&lon=-122.0250607&z=18&l=0&m=b&search=yahoo

Photo of Bay Area Hadoop Meetup group
Bay Area Hadoop Meetup
See more events