addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Bay Area Hadoop User Group (HUG) Monthly Meetup

November 2011 HUG Agenda:

  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - Oozie evolution: Gateway to the Hadoop ecosystem
  • 7:00 - 7:30 - Blur - Lucene on Hadoop
  • 7:30 - 8:00 - HParser, a data parsing solution for MapReduce and Hadoop

Oozie evolution: Gateway to the Hadoop ecosystem

During the past two years Oozie has functionally matured and now plays a pivotal role providing access to Hadoop resources through RESTful APIs, improved scheduling, and workflow management.  During this maturation Oozie has also been contributed by Yahoo! to the Apache Foundation widening the community of contributors and beneficiaries.  

There remain significant challenges in making Oozie the gateway to Hadoop.  This presentation will highlight some of the key advances, architectural issues, and challenges that face the Oozie community as Oozie continues to evolve.

Presenter: Mohammad Islam, Yahoo

Blur - Lucene on Hadoop

Blur is a new Hadoop based project that combines Lucene, Hadoop, ZooKeeper, and Thrift to create a horizontally-scalable, distributed read/write search engine that integrates into the Hadoop stack.

Presenter: Aaron McCurry, Near Infinity

HParser, a data parsing solution for MapReduce and Hadoop

Organizations are now increasingly interested in finding more efficient ways to tackle deeply hierarchical data including XML and JSON as wellas other complex data formats like Web logs, binaries, and machine generated data in Hadoop. 

How are you currently developing setting up data parsing tasks insideMapReduce? Are you interested in native streaming and splitting capabilities  allow effective handling of files in any size regardless of format. In this session, we will share with you about HParseroptimized for parallel parsing in Hadoop including technical demonstration of HParser.

Presenter: Ronen Schwartz, Informatica

Yahoo Campus Map:

Detail map

Location on Wikimapia:[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

Join or login to comment.

  • doug c.

    great speakers

    November 17, 2011

  • Vlad S.

    Blur was the most interesting topic. Oozie didn't provide any real technical details, just chi-chat, not interesting for a tech guy.

    November 17, 2011

  • A former member
    A former member

    Why don't you guys open this up to having Industry help coordinate/lead the meetings (aka, ask Avik Dey to come back...who cares if he's at eBay!)

    November 17, 2011

  • A former member
    A former member


    November 17, 2011

  • A former member
    A former member

    Well organized, interesting speakers and topics

    November 17, 2011

  • A former member
    A former member

    swell event

    November 16, 2011

  • Long P.

    Let's connect if you're inclined to see what my team is up to @Apple.
    Email: [masked]

    November 16, 2011

  • Shyam S.

    A training course on Financial Services, Dodd Frank Regulations and Hadoop :


    October 30, 2011

Our Sponsors

  • Yahoo

    Free admission, Space, Pizza and Beer

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy