addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

39th Bay Area Hadoop User Group (HUG) Monthly Meetup


  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - Removing the NameNode's memory limitation
  • 7:00 - 7:30 - Hue: the UI for Apache Hadoop
  • 7:30 - 8:00 - Compression Options in Hadoop - A Tale of Tradeoffs


Session I: Removing the NameNode's memory limitation

Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits the total number of files that a single NameNode can store. While Federation allows one to create multiple volumes with additional Namenodes, there is a need to scale a single namespace and also to store multiple namespaces in a single Namenode. 
This talk describes a project that removes the space limits while maintaining similar performance by caching only the working set or hot metadata in Namenode memory. We believe this approach will be very effective because the subset of files that is frequently accessed is much smaller than the full set of files stored in HDFS.  
In this talk we will describe our overall approach and give details of our implementation along with some early performance numbers.

Speaker: Lin Xiao, PhD student at Carnegie Mellon University, intern at Hortonworks


Session II: Hue: the UI for Apache Hadoop

Hue is an open source, Web-based interface that makes Apache Hadoop easier to use. Hue’s target is the Hadoop user experience and lets users focus on quick data processing. Hue is a mature Web project that integrates into a single UI the Hadoop components and their main satellite projects.
This talk describes how Hue’s apps like File Browser and Job Browser let you list, move, upload HDFS files or access job logs in a few clicks. Workflows can be built and scheduled repetitively with some drag & drop interfaces and wizards, without having to deal with any Oozie XML.
Hue comes with three editors: Hive, Pig and Impala. Each editor improves readability and productivity by providing cool features like syntax highlighting. Some other apps let you customize Solr search results, browse HBase tables or submit Sqoop jobs. Moreover, Hue comes with a SDK for letting developers reuse its libraries and start building apps on top of Hadoop.
To sum-up, attendees of this talk will learn how Hue can open their Hadoop user base and why it is the ideal client for getting familiar or using the platform.

Speaker: Romain Rigaux, Software Engineer, Cloudera


Session III: Compression Options in Hadoop - A Tale of Tradeoffs

Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This talk attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The talk also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.

Speaker: Govind Kamat, Member of Technical Staff, Yahoo!


Yahoo Campus Map:

Detail map


Location on Wikimapia:[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo


Join or login to comment.

  • Haidar H.

    was this meetup recorded? if so how can I access it

    August 22, 2013

  • Ali L.

    Hats off to Romain Rigaux and Hue, but I think there is room for further visualization. I wish Hue had hive and pig query UI like regexpal (simple huh), where you could see the highlighted result in place, also, some graphing tools, nothing very fancy but good enough to avoid the need to go to matlab and R for everything!

    August 21, 2013

    • Ali L.

      Agree, Pig/Hive are batch oriented, but let's say for an analyst during dev or while debugging it is not that odd to play with small data. So, when you are dealing with small data jobs can/may finish faster and it would be nice to see how highlighted bread crumbs of data within the test data. That could give you more insight as how queries or analytic algorithms should change.

      August 22, 2013

    • Ali L.

      Correction: it would be nice to see how highlighted bread crumbs of data are extracted from the test data

      August 22, 2013

  • Ali L.

    Finally the talk I have been waiting for "Compression"!

    August 21, 2013

  • A former member
    A former member

    Looking forward for the event

    July 25, 2013

  • Pratyaksha R.

    Hi All... want to learn hadoop..

    July 24, 2013

  • A former member
    A former member


    July 23, 2013

  • Lars S.

    Looking forwards to learning more

    July 11, 2013

  • rajesh v.

    This is my first meetup and I am excited

    July 6, 2013

  • Laura U.

    We are a different kind of Lab. Accenture Tech Labs Data Insights is ready to talk to data architects, developers with Hadoop/Cassandra/MatReduce skills, and those of you with fantastic ideas. All levels open. Check it out at

    2 · June 20, 2013

Our Sponsors

  • Yahoo

    Free admission, Space, Pizza and Beer

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy