Advanced Hadoop Architectures and Unstructured Data Mining

Hadoop.Next, HDFS Federation and High Availability

Owen O'Malley

Cofounder and Senior Architect, Hortonworks 

Join Hortonworks cofounder and Apache Hadoop Committer Owen O'Malley, as he outlines Hadoop.Next and the approach and current status for the HDFS improvements. Apache Hadoop is the de-facto Big Data platform for data storage and processing. The current stable, production release of Hadoop is Hadoop 1.0. The Apache Hadoop community is actively working on Hadoop 0.23 which is the next major version of Hadoop with several notable improvements including HDFS Federation, High Availability and NextGen MapReduce. The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo, Facebook and other enterprises. However, the NameNode does not have automatic failover. A hot failover solution called HA NameNode is under active development (HDFS-1623) and making excellent progress. 

Owen contributed patches to Hadoop before it became an independent Apache project. He was the first committer added and still remains one of the most active contributors to Apache Hadoop. He was also the founding chair of the Apache Hadoop Project Management Committee. Prior to co-founding Hortonworks, Owen worked on Yahoo! Search’s WebMap project, which built and performed heuristic analyses over a graph of the known web. Once ported to Apache Hadoop, it became the single largest known Hadoop application. He has a PhD in Software Engineering from the University of California, Irvine. Owen may be followed on Twitter: @owen_omalley. 


New Architectural Possibilities for Hadoop

Ted Dunning

Chief Application Architect, MapR

There are a number of assumptions that come with using standard Hadoop that are based on Hadoop's initial architecture.  Many of these assumptions can be relaxed with more advanced architectures such as those provided by MapR.  An important cluster of these assumptions are essentially work-arounds for the limitations of HDFS.  By augmenting HDFS-compatible access with access to files across multiple clusters using standard protocols like NFS, MapR makes many of these work-arounds unnecessary.  I will describe the underlying architecture that MapR uses to enable these advances and show how this can simplify systems or, in some cases, make certain classes of programs run orders of magnitude faster.

Ted has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the most advanced identity theft detection system on the planet, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 15 issued and 15 pending patents and contributes to several Apache open source projects including Hadoop, Zookeeper and Hbase. He is also a committer for Apache Mahout. Ted earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom.


Making Sense of the Data Chaos

Adam Gugliciello

Solution Engineer, Datameer

Until recently, data analysis by companies and government agencies has typically been based on structured datasets. This session will demonstrate how new insights can be gained from large amounts of text data, such as company documents, emails, and twitter data that could traditionally not be mined or analyzed. Through specific use cases and interesting examples, we will demonstrate how to take very large unstructured text documents and easily extract useful business insight from them. This talk will discuss: uncovering and retrieving new insights from volumes of data, gleaning value from unstructured and unused sources, and enhanced customer intelligence

Adam Gugliciello is a 15-year veteran in Software Engineering and Systems Architecture and specializes in highly available, parallel systems. Most recently he has developed grid computing solutions to enable deep analyses and intelligence gathering on huge software systems for technical debt and functional mapping. Adam is a Solution Engineer at Datameer and helps bring Financial and Telco applications expertise to the utilization of the Datameer business intelligence suite.



6:00-6:30 pm - Networking

6:30-7:00 pm - First presentation

7:05-7:35 pm - Second presentation

7:40-8:10 pm - Third presentation

Join or login to comment.

  • A former member
    A former member

    Are these presentations available? Thanks.

    March 21, 2012

  • Ahmet M. U.

    Would definitely appreciate receiving a copy of the slides! Thanks.

    March 15, 2012

  • Val


    March 15, 2012

  • Martes W.

    This was my first Hadoop, or true "cloud" technology user's meeting and I was very pleasantly surprised at the level of professional network that was associated with the event. Very glad I could make it!!!!!

    March 14, 2012

  • Edward S.

    I thought this was a great event. Seemed like a smart crowd. I wish the MapR guy hadn't been so hostile towards Apache Hadoop, but I guess that is his job. A little 'sales-ey', but informative all the same and a good way to meet fellow practitioners.

    March 14, 2012

  • Asif I.

    It was OK. I was expecting some examples and little bit more inside.

    March 14, 2012

  • Amit S.

    It was good...could have been more technical

    March 14, 2012

  • Hari D.

    Thanks to the Organizers & hosts - great job!

    I'd prefer less marketing/sales - more tech, things that are not immediately available on company websites. Thanks to MapR for the caps :) The concept of "data sliding" is very cool!

    March 14, 2012

  • Jr S.

    Thought it was some good useful information. In general liked the presenters.

    March 14, 2012

  • Bobby P.

    Great insight into products, and the examples given were very practical and relative to challenges we're trying to professionally provide solutions for.

    March 14, 2012

  • Stephen R.

    While I thought the talks were educational beyond the sales pitch presentations, I thought the bickering between the closed sourced and open source presenters detracted from the unifying aspect of the Hadoop meetup being a community. Although I recognize these these jabs were in good nature, I view these meetups as different and sometimes competing groups (Greenplum and Cloudera for example) coming together for a common goal of promoting Hadoop.

    March 14, 2012

  • Brian V.

    Too much sales pitch and not enough juice for MapR, and Owen's talk was too much re-hash of stuff from the HA web site. And because time ran over, I never saw the third presentation as I had to leave by 8.

    March 14, 2012

  • Vishal S.

    I expected to see some samples. The presenters were great, but I had different expectations. In addition, making it to DC at 6:00 is always a challenge.

    March 14, 2012

  • Robin C.

    To see this meetup I spend 2.5 hours driving. Which isn't a big deal if the talks cover interesting topics. The description of this meetup seemed very interesting but instead it was:
    Dunning's talk = a sales pitch
    O'Malley's talk = the most interesting talk but unfortunately VERY rushed
    Gugliciello's talk = another sales pitch and next to nothing about dealing with unstructured data

    March 14, 2012

  • Glenna G.

    Really cool stuff, thank you so much!

    March 14, 2012

  • Ryan S.

    Wasn't able to make it last minute. Any way to get the slides?
    Thanks for putting this on.

    1 · March 14, 2012

  • Dennis P.

    Stil stuck in Columbia, MD at the office. Please anyone, feel free to take my spot. I really wished I could have been there...

    March 13, 2012

  • Chad C.

    And I'm still out here in Hyattsville, hopefully there'll be a few notes and "recordings" bootleg ?:) Anybody wants to walk right in and have my spot feel free!

    March 13, 2012

  • Sunil M.

    Stuck in traffic approaching 66 forever ... is it just me ?

    March 13, 2012

  • Spencer

    Hey guys, I've got a deadline tomorrow that isn't coming along nearly fast enough so someone on the waiting list gets my spot.

    March 13, 2012

  • Ed K.

    The address is 1445 New York Ave NW Washington, DC 20005. Sorry about any confustion.

    March 13, 2012

  • Opesh G.

    Is it 1445 New York Ave, Washington, DC 20002 or 1445 New York Ave NW Washington, DC 20005

    March 13, 2012

  • Asif I.


    March 10, 2012

  • Brian V.

    I'm guessing this is NW, no?

    March 1, 2012

Our Sponsors

  • Booz Allen

    Thanks to Booz Allen for sponsoring the location and refreshments!

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Start your Meetup today

Act now and get 50% off.
Until February 1.

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy