Storm 101: Find Top Tweets in Real-Time


Details
DATA: A live-coding session where you will learn how to use Storm. This event is for software developers with some background in Java programming who are interested in distributed data processing.
Storm (http://storm-project.net) is a distributed real-time stream processing framework open-sourced by Nathan Marz of Twitter.
We'll be using Storm to analyze a randomized 1% (https://dev.twitter.com/docs/api/1.1/get/statuses/sample) of the real-time Twitter firehose to identify the most retweeted tweets around the globe.
AGENDA
Pro-Talk [30 min]: Abhi will do a short presentation on Storm fundamentals (https://github.com/nathanmarz/storm/wiki/Concepts).
Live Coding Tutorial [90 min]: As a group, we will work through a Storm project to filter and analyze tweets from Twitter's streaming API. We'll highlight the most popular real-time tweets across Twitter based on the number of retweets received.
REQUIREMENTS
-
Laptop with a Java development environment that includes JDK 1.6 (http://www.oracle.com/technetwork/java/javase/downloads/index.html), Eclipse (http://www.eclipse.org/downloads/) and Maven (http://wiki.eclipse.org/Maven_Integration).
-
Twitter username/password (for authenticating with the Twitter API)
-
Knowledge of Twitter4j (http://twitter4j.org/en/index.html) is helpful, but not required.
IMPORTANT: Due to the level of interest in Storm, this session is intended for developers who already work with large data sets and interested in exploring Storm. Future sessions will focus on other technologies/approaches for data processing and visualization.

Storm 101: Find Top Tweets in Real-Time