Skip to content

Hands-on Programming using Hadoop, MapReduce and Hive

Photo of Pitt Fagan
Hosted By
Pitt F.
Hands-on Programming using Hadoop, MapReduce and Hive

Details

Hi everyone,

When we sent out the member survey a few months ago, the most frequent request was to set up a hands-on meetup for folks could code and deploy some of the Big Data techniques we have heard about in our past meetups. I am happy to announce that the next two meetups will be focused on this goal.

In the spirit of Wisconsin's favorite sport, we will be using a large dataset of NFL data from Jesse Anderson (http://www.jesse-anderson.com/2013/01/nfl-play-by-play-analysis/). Jesse took NFL data + arrest data + weather data and combined it (using MapReduce) and made it queryable (using Hive). Here is another link about this project:
http://techcrunch.com/2013/08/04/how-data-changes-preconceptions-about-nfl-football-the-weather-and-the-parallel-universe/
We will go over the concepts of MapReduce and Hive, and show examples of putting data in HDFS, running the MapReduce job, running a SQL query through HUE, etc.

Before class if possible:

Download the class materials will make sure we can get right to the Hadoop!
• Please download the Cloudera Quickstart VM - available in VMware, KVM, and VirtualBox formats: http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html
• For help and additional information on the quickstart VM: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Run the VM. Feel free to check out Cloudera Manager and HUE from the splash screen.

Then you'll clone Jesse Anderson's NFL project. Open up a terminal window:
[cloudera@localhost ~]$ cd workspace/
[cloudera@localhost workspace]$ git clone https://github.com/eljefe6a/nfldata

Bring laptop to class!

I hope everyone can make it. I'm excited!

Thanks,
Pitt Fagan

Photo of Big Data Madison group
Big Data Madison
See more events