By now, everyone's heard of Hadoop and Big Data. But few have had time to actually get started.
Come see it in action with a demo by Big Data veteran Paco Nathan, and learn about benefits, tradeoffs, and software support.
The demo will be a real implementation, running Python scripts in Hadoop Streaming on the Elastic MapReduce service.
Examples will build on a "wordcount" app (which is the "Hello World" of MapReduce) to show how to perform text mining and some simple machine learning approaches.
Data for these examples comes from the "Enron email" data set available on Infochimps at http://infochimps.org/search?query=enron![]()
![]()
We'll see what we can discover about those Enron email messages, using EMR.
I can not attend, but could you publish the video.
If anyone has info about a web conference for this let me know.
Now streaming live at http://www.ustream.tv/channel/big-data
!
Many thanks for the opportunity to present last night! Lots of great questions and discussion. Glad to meet so many new people working with AWS and Hadoop!
Link to slides: http://www.slideshare.net/pacoid/getting-started-on-hadoo...![]()
![]()
Link to code + data: http://github.com/ceteri/ceteri-mapred![]()
Src repo on GitHub shows more detail about Py scripts for MapReduce jobs used on Enron email. Also, check out the Gephi doc (requires d/l) which is a really fun tool for exploring social graphs.
Log in to Meetup with your Facebook account.
Is the demo a toy example or a real implementation?
I'm looking for a helloworld example since I have no exposure to Hadoop code.