addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

34th Bay Area Hadoop User Group (HUG) Monthly Meetup


  • 6:00 - 6:30 PM - Socialize over food and beer(s), General announcements
  • 6:30 - 7:00 PM - Session I: DistCp Redux and the Dynamic InputFormat
  • 7:00 - 7:30 PM - Session II: Impala - Real-time Queries for Apache Hadoop
  • 7:30 - 8:00 PM - Session III: Cloud-Friendly Hadoop and Hive

Session I (6:30 - 7:00 PM) : DistCp Redux and the Dynamic InputFormat

DistCp (distributed copy) is a popular tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. This talk will cover the rationale behind the DistCp rewrite for Hadoop 23, the design, new features and a performance comparison with legacy. It will also introduce a different approach to balancing load across mapper tasks via the DynamicInputFormat.

Speaker: Mithun Radhakrishnan, Software Engineer, Yahoo!

Session II (7:00 - 7:30 PM) : Impala - Real-time Queries for Apache Hadoop

The Cloudera Impala project is for the first time making scalable parallel database technology, which is the underpinning of Google's Dremel as well as that of commercial analytic DBMSs, available to the Hadoop community. With Impala, the Hadoop community now has an open-sourced codebase that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators. This talk will start out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation, and will conclude with a comparison of Impala with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.

Speaker: Mark Grover, Software Engineer, Cloudera

Session III (7:30 - 8:00 PM) : Cloud-Friendly Hadoop and Hive

The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters which have different tradeoffs from a cloud environment and making them run well in the cloud presents some challenges. In this talk, we describe how we've extended Hadoop and Hive to exploit these new tradeoffs and offer them as part of the Qubole Data Service (QDS). We will also present use-cases that show how QDS is making it extremely easy for an end user to use these technologies in the cloud.

Speaker: Ashish Thusoo, CEO, Qubole

Yahoo Campus Map:

Detail map

Location on Wikimapia:[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

Join or login to comment.

  • Joe B.

    Thanks guys, I got lucky and won the strata Tix, Awesome!!
    I really wanted to go

    January 18, 2013

  • Leo

    Would suggest to have architecture diagram (with a bit more meat in it) for distcp. Prior / post (+- where is the beef)

    January 17, 2013

  • Ilya E.

    The first presentation was not so good, too many details on very narrow topic.
    The second and third one has been really good

    January 17, 2013

  • Yahoo! HUG O.

    Free Hadoop and Big Data books and a pass to Strata Conference, Santa Clara Feb 26-28, 2013 ( at this meet up, courtesy of O’Reilly Media. Please bring your business card if you are interested in the drawing for the books and conference pass.

    2 · January 10, 2013

  • Pashu P.

    Hi everyone! For those who can't make it to this great event, we still have a few open seats that same night for 'The Elephant Riders', a discussion around the relevance of Hadoop with an elite set of technologists. - Wishing you all a great event!

    January 9, 2013

Our Sponsors

  • Yahoo

    Free admission, Space, Pizza and Beer

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy