addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Bay Area Hadoop User Group (HUG) Monthly Meetup

Detailed agenda and summaries to follow. General agenda:

  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - HCatalog Overview
  • 7:00 - 7:30 - Rhadoop, Hadoop for R
  • 7:30 - 8:00 - Storm: distributed and fault-tolerant realtime computation

HCatalog Overview

HCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of polyglot persistence. Basically, what that comes down to is that you pick the right tool for the job. In the Hadoop ecosystem, you have many tools that might be used for data processing - you might use Pig or Hive, or your own custom MapReduce program, or that shiny new GUI-based tool that's just come out. And which one to use might depend on the user, or on the type of query you're interested in, or the type of job we want to run. From another perspective, you might want to store your data in columnar storage for efficient storage and retrieval for particular query types, or in text so that users can write data producers in scripting languages like Perl or Python, or you may want to hook up that HBase table as a data source. As a end-user, I want to use whatever data processing tool is available to me. As a data designer, I want to optimize how data is stored. As a cluster manager/data architect, I want the ability to share pieces of information across the board, and move data back and forth fluidly. HCatalog's hopes and promises are the realization of all of the above.

Presenter: Sushanth Sowmyan, Hortonworks

Rhadoop, Hadoop for R

RHadoop is an open source project aiming to combine two rising star in the analytics firmament: R and Hadoop. With more than 2M users, R is arguably the dominant language to express complex statistical computations. Hadoop needs no introduction at HUG. With RHadoop we are trying to combine the expressiveness of R with the scalability of Hadoop and to pave the way for the statistical community to tackle big data with the tools they are familiar with. At this time RHadoop is a collection of three packages that interface with HDFS, HBase and mapreduce, respectively. For mapreduce, the package is called rmr and  we tried to give it a simple, high level interface that's true to the mapreduce model and integrated with the rest of the language. We will cover the API and provide some examples.

Presenter: Antonio Piccolboni, Revolution Analytics

Storm: distributed and fault-tolerant realtime computation

Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by many companies around the world. 
Storm has a wide range of use cases, from stream processing to continuous computation to distributed RPC. In this talk I'll introduce Storm and show how easy it is to use for realtime computation.

Presenter: Nathan Marz, Twitter

Yahoo Campus Map:

Detail map

Location on Wikimapia:[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

Join or login to comment.

  • A former member
    A former member

    Good experience..
    Storm presentation was awesome..
    Felt sorry for first speaker that had to deal with Audio issues.

    One suggestions is that we need to better separate the presentation section from the informal meeting in the back of the room.. The noise of folks talking was distracting from the people presenting.

    January 23, 2012

  • A former member
    A former member

    I've posted the slides from my Storm talk online:

    January 19, 2012

  • Paul L.

    The technical difficulties with the audio system made it very difficult to follow the presenters. The technical topics were of interest and their oral presentation was needed to take full advantage of the knowledge transfer. As the speakers were not operational, a backup system was used, however the backup system was close to not audible.

    January 19, 2012

  • James Z.

    Topics were ok, but sound effect was poor.

    January 19, 2012

  • Rabi K.

    Over all great,
    1st session was very high level. Expected to be more technical.
    2nd session (rmr) was a good mixture of use case information
    3rs session (storm) was very good.

    January 19, 2012

  • A former member
    A former member

    The meeting went well for such a large group. My only issue was a third of the talk was about non-Hadoop technologies. While this non-Hadoop talk was interesting, kinda defeats the point of the group I think.

    January 19, 2012

  • A former member
    A former member

    Not so good. Audio system did not work properly, in addition to people talking in the back. I did not gain much beyond what could be learned by reading the abstracts. Due to audio problems, I left at 730pm.

    January 19, 2012

  • Mayank V.

    poor sound quality

    January 19, 2012

  • A former member
    A former member

    It was Probably one of my worst experiences in recent times. The audio quality was very poor and the speaker spent a lot of time explaining general issues about databases, which is way too basic for gathered professionals.
    The setting of the meetup in the cafe was not a good choice, with lot of humming noises from coffee / soda machines and lots of drunk techies talking loudly in their own groups!.
    Here's something I would suggest for your next meetup.
    1. Limit the number of attendees, say to first 100 only. Don't make it an open ended free for all affair.
    2. Select a location that is a conference room / hall with proper audio / video setup.
    3. DO NOT serve free booze. Techies go wild with a beer and pizza combination!!
    4. Start on time, lay down the agenda and inform attendees to submit their questions to a few volunteers which will then coordiate with the speaker at the end for a Q&A session.
    I hope the next meetup would be better and looking foward to attend many more!

    January 19, 2012

  • Rahul C.

    Great meetup to see various tool on Hadoop to make it work and make it interoperable. Storm was the best out of the three.

    January 19, 2012

Our Sponsors

  • Yahoo

    Free admission, Space, Pizza and Beer

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy