July 19, 2010 7:00 PM - 120 attended

Getting Started on Hadoop

Fenwick & West (map)

Selected By: Sebastian

By now, everyone's heard of Hadoop and Big Data. But few have had time to actually get started.

Come see it in action with a demo by Big Data veteran Paco Nathan, and learn about benefits, tradeoffs, and software support.

  • Sridhar
    Sridhar

    Is the demo a toy example or a real implementation?

    I'm looking for a helloworld example since I have no exposure to Hadoop code.

    Posted July 13, 2010 at 4:36 PM
  • Paco Nathan
    Paco Nathan

    The demo will be a real implementation, running Python scripts in Hadoop Streaming on the Elastic MapReduce service.

    Examples will build on a "wordcount" app (which is the "Hello World" of MapReduce) to show how to perform text mining and some simple machine learning approaches.

    Data for these examples comes from the "Enron email" data set available on Infochimps at http://infochimps.org/search?query=enron

    We'll see what we can discover about those Enron email messages, using EMR.

    Posted July 14, 2010 at 9:13 AM
  • Sridhar
    Sridhar

    Hmmmmm, okay. Thanks Paco.

    Posted July 17, 2010 at 10:00 AM
  • Sergey Zelvenskiy
    Sergey Zelvenskiy

    I can not attend, but could you publish the video.

    Posted July 19, 2010 at 6:25 PM
  • James Moore
    James Moore

    If anyone has info about a web conference for this let me know.

    Posted July 19, 2010 at 6:34 PM
  • Sebastian
    Sebastian

    Now streaming live at http://www.ustream.tv/channel/big-data !

    Posted July 19, 2010 at 6:52 PM
  • Paco Nathan
    Paco Nathan

    Many thanks for the opportunity to present last night! Lots of great questions and discussion. Glad to meet so many new people working with AWS and Hadoop!

    Link to slides: http://www.slideshare.net/pacoid/getting-started-on-hadoo...

    Link to code + data: http://github.com/ceteri/ceteri-mapred

    Src repo on GitHub shows more detail about Py scripts for MapReduce jobs used on Enron email. Also, check out the Gephi doc (requires d/l) which is a really fun tool for exploring social graphs.

    Posted July 20, 2010 at 9:44 AM
  • You must be a member to post a comment. Join or login.

120 attended
4.00 4.0012 (12 ratings)

CloudStack

CloudStack (cloudstack.org) is an open source cloud computing platform.

Scalr

Cloud management specialized in scaling web apps

VMware

Build secure private clouds to deliver Infrastructure as a Service

Rackspace

World's Leading Specialist in the Hosting and Cloud Computing Industry.

DataStax

DataStax is the Commercial Company behind Apache Cassandra™

HP

To learn more about HP’s cloud solutions, visit hp.com/go/cloudtalk

AWS

Pay-as-you-use cloud computing services for everyone

Offer a perk for our members and get exposure.

Offer a perk →
Other nearby
Meetups
Why these groups?
x

The Meetup Groups shown here are topically similar to Silicon Valley Cloud Computing Group.

Groups are more likely to be displayed here if they:

  • have a Meetup scheduled
  • have a high rating
  • have a group photo
  • are "public" and not "private"
  • have shown they are likely to stick around (older than 30 days)
Find more Meetup Groups
near Palo Alto

Log in

  • Not registered with us yet?
or

Log in to Meetup with your Facebook account.

Log in using Facebook

Sign up

or

Join this Meetup Group even quicker with your Facebook account.

Sign up using Facebook
By clicking the "Sign up using Facebook" or "Sign up" buttons above, you agree to Meetup's Terms of Service