37th Bay Area Hadoop User Group (HUG) Monthly Meetup


Details
Detailed agenda and summaries to follow. General agenda:
6:00 - 6:30 - Socialize over food and beer(s) 6:30 - 7:00 - HCatalog/Hive Data Out 7:00 - 7:30 - Apache Sqoop 2 - A next generation of data transfer tools 7:30 - 8:00 - Building common denominator of Hadoop distributions with Bigtop
Session I (6:30 – 7:00 PM): HCatalog/Hive Data Out
Yahoo! Hadoop grid makes use of a managed service to get the data pulled into the clusters. However, when it comes to getting the data-out of the clusters, the choices are limited to proxies such as HDFSProxy and HTTPProxy. With the introduction of HCatalog services, customers of the grid now have their data represented in a central metadata repository. HCatalog abstracts out file locations and underlying storage format of data for the users, along with several other advantages such as sharing of data among MapReduce, Pig, and Hive. In this talk, we will focus on how the ODBC/JDBC interface of HiveServer2 accomplished the use case of getting data out of the clusters when HCatalog is in use and users no longer want to worry about the files, partitions and their location. We will also demo the data out capabilities, and go through other nice properties of the data out feature.
Presenter(s): Sumeet Singh, Director, Product Management, Yahoo!
Chris Drome, Technical Yahoo!
Session II (7:00 – 7:30 PM): Apache Sqoop 2 - A next generation of data transfer tools
Apache Sqoop 2 is the next generation of the massively successful open source tool designed to transfer data between traditional SQL databases and warehouses into Apache Hadoop. Sqoop 2 is designed as a client-server system with a repository which stores connection and job information. Sqoop 2 is designed to support secure job submission and multiple different roles for users. In this talk, we will discuss the issues users faced in Sqoop 1, and the design of Sqoop 2 and how the issues faced in Sqoop 1 are being handled in Sqoop 2.
Presenter(s): Hari Shreedharan, Software Engineer, Cloudera
Session III (7:30 – 8:00 PM): Building common denominator of Hadoop distributions with Bigtop
What it takes to get to Hadoop2 GA?
Bigtop is stepping up in its role as the foundation of a standard Hadoop-based data analytics stack, essentially bringing most of the commercial offering to the standard footing. 6 out of 7 commercial vendors using Bigtop framework to power their distributions based on ASF Hadoop.
Bigtop is also the must have stabilization tool for Hadoop platform where's any downstream application or system developer can make sure that their software would work with the next version of Hadoop.
Presenter(s): Dr. Konstantin Boudnik, ASF Hadoop committer, Bigtop PMC; Director of Engineering, WANdisco
Roman Shaposhnik, VP, Apache Bigtop, IPMC member at ASF; Software engineer, Cloudera inc.
Yahoo Campus Map:
Detail map (http://photos4.meetupstatic.com/photos/event/2/8/e/d/600_21370477.jpeg)
Location on Wikimapia:
http://www.wikimapia.org/#lat=37.4181633&lon=-122.0250607&z=18&l=0&m=b&search=yahoo

Sponsors
37th Bay Area Hadoop User Group (HUG) Monthly Meetup