November 2011 HUG Agenda:
- 6:00 - 6:30 - Socialize over food and beer(s)
- 6:30 - 7:00 - Oozie evolution: Gateway to the Hadoop ecosystem
- 7:00 - 7:30 - Blur - Lucene on Hadoop
- 7:30 - 8:00 - HParser, a data parsing solution for MapReduce and Hadoop
Oozie evolution: Gateway to the Hadoop ecosystem
During the past two years Oozie has functionally matured and now plays a pivotal role providing access to Hadoop resources through RESTful APIs, improved scheduling, and workflow management. During this maturation Oozie has also been contributed by Yahoo! to the Apache Foundation widening the community of contributors and beneficiaries.
There remain significant challenges in making Oozie the gateway to Hadoop. This presentation will highlight some of the key advances, architectural issues, and challenges that face the Oozie community as Oozie continues to evolve.
Presenter: Mohammad Islam, Yahoo
Blur - Lucene on Hadoop
Blur is a new Hadoop based project that combines Lucene, Hadoop, ZooKeeper, and Thrift to create a horizontally-scalable, distributed read/write search engine that integrates into the Hadoop stack.
Presenter: Aaron McCurry, Near Infinity
HParser, a data parsing solution for MapReduce and Hadoop
Organizations are now increasingly interested in finding more efficient ways to tackle deeply hierarchical data including XML and JSON as wellas other complex data formats like Web logs, binaries, and machine generated data in Hadoop.
How are you currently developing setting up data parsing tasks insideMapReduce? Are you interested in native streaming and splitting capabilities allow effective handling of files in any size regardless of format. In this session, we will share with you about HParseroptimized for parallel parsing in Hadoop including technical demonstration of HParser.
Presenter: Ronen Schwartz, Informatica
Yahoo Campus Map:
Location on Wikimapia: