Hands-on Hadoop with MapReduce, Hive and Impala


Details
The only two entrances to the US Bank Building that are open after 6pm are 1) Cass Street (accessed through the Michigan St. Level) and 2) through the US Bank parking structure (unlocked until 7pm). The elevators will lock at 6pm and will need an employee badge to get to the Galleria floor (where the Vandenberg Room is located).
Ryan Bosshart, Cloudera, will be our speaker.
Description:
In the spirit of Wisconsin's favorite sport, we will be walking through how NFL play-by-play data can be processed and analyzed using open source tools in the Hadoop ecosystem. This project was created by Jesse Anderson ( http://www.jesse-anderson.com/2013/01/nfl-play-by-play-analysis/ ). Jesse processed and combined an NFL dataset, player arrest data, and weather data using MapReduce and Hive. The resulting dataset could then be analyzed to understand: does Peyton really not like cold weather? Which is the baddest NFL team?
This will all be run from the Cloudera Quickstart VM, and those interested are welcome to bring their laptops to follow along, gain hands-on experience, and ask questions. We will go over the concepts of MapReduce, Hive, and Impala. We will show examples of putting data in HDFS, running the MapReduce job, running a SQL query through HUE, etc.
For those who would like to follow along, please do the following before the session:
Please download the Cloudera Quickstart VM - available in VMware, KVM, and VirtualBox formats: http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html For help and additional information on the quickstart VM: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html . Please note there are minimum resource requirements for running the Cloudera Quickstart VM. Run the VM. Feel free to check out Cloudera Manager and HUE from the splash screen. Clone Jesse Anderson's NFL project. Open up a terminal window and enter the following:
[cloudera@localhost ~]$ cd workspace/
[cloudera@localhost workspace]$ git clone https://github.com/eljefe6a/nfldata
Bring laptop to class!

Hands-on Hadoop with MapReduce, Hive and Impala