When we sent out the member survey a few months ago, the most frequent request was to set up a hands-on meetup for folks could code and deploy some of the Big Data techniques we have heard about in our past meetups. I am happy to announce that the next two meetups will be focused on this goal.
In the spirit of Wisconsin's favorite sport, we will be using a large dataset of NFL data from Jesse Anderson (http://www.jesse-anderson.com/2013/01/nfl-play-by-play-analysis/). Jesse took NFL data + arrest data + weather data and combined it (using MapReduce) and made it queryable (using Hive). Here is another link about this project:
We will go over the concepts of MapReduce and Hive, and show examples of putting data in HDFS, running the MapReduce job, running a SQL query through HUE, etc.
Before class if possible:
Download the class materials will make sure we can get right to the Hadoop!
• Please download the Cloudera Quickstart VM - available in VMware, KVM, and VirtualBox formats: http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html
• For help and additional information on the quickstart VM: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Run the VM. Feel free to check out Cloudera Manager and HUE from the splash screen.
Then you'll clone Jesse Anderson's NFL project. Open up a terminal window:
[cloudera@localhost ~]$ cd workspace/
[cloudera@localhost workspace]$ git clone https://github.com/eljefe6a/nfldata
Bring laptop to class!
I hope everyone can make it. I'm excited!