Hands-on Hadoop with MapReduce, Hive and Impala


Details
The only two entrances to the US Bank Building that are open after 6pm are 1) Cass Street (accessed through the Michigan St. Level) and 2) through the US Bank parking structure (unlocked until 7pm). The elevators will lock at 6pm and will need an employee badge to get to the Galleria floor (where the Vandenberg Room is located).
Milwaukee Big Data User Group (BDUG) Members:
Join us for our next meeting on Tuesday evening, April 8th for a continuation of our last session. By member request, last month’s guest speaker Ryan Bosshart of Cloudera will rejoin us to conduct the hands on workshop.
Please reference the notes from Ryan below, download the Cloudera Quickstart VM, install, and setup the code space BEFORE you come to the workshop. The goal of this time is to lead members through the actual experience of working with Hadoop at the command line level. While some code will be explained in detail, much of the work will be using code from the sample to personally experience how work is done using core native tools to the Hadoop environment. Get your geek hat on and come ready to experience and learn.
Thank you again to Baird and member Andy Wenzel for providing a place to meet. Food and beverages will be available prior to the workshop.
Best!
Leadership Team – Milwaukee BDUG
INSTRUCTIONS TO BEST PREPARE FOR THE WORKSHOP
- This workshop is a continuation of the prior session; more informal and hands-on. The goal is to introduce the audience to specific ways they can interact with Hadoop by performing tasks such as:
a. Data ingestion
b. Launching MapReduce jobs
c. Creating metadata
d. Interacting with HUE
e. Querying data using Hive and Impala.
This specific project was created by Jesse Anderson ( http://www.jesse-anderson.com/2013/01/nfl-play-by-play-analysis/ ). Jesse processed and combined an NFL dataset, player arrest data, and weather data using MapReduce and Hive. The resulting dataset could then be analyzed to understand: does Peyton really not like cold weather?
- The workshop will all be run from the Cloudera Quickstart VM, please do the following before the session:
• Please download the Cloudera Quickstart VM. Use CDH4.6 version please! Available in VMware, KVM, and VirtualBox formats: http://www.cloudera.com/content/support/en/downloads/download-components/download-products.html
• For help and additional information on the quickstart VM: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html . Please note there are minimum resource requirements for running the Cloudera Quickstart VM. I'm happy to help with any Hadoop related problems but I'm not a VM expert and cannot help with VMware or Virtualbox issues.
• Run the VM. Open a browser and check out Cloudera Manager, make sure the services are running (you may need to start them). Feel free to check out Cloudera Manager and HUE from the splash screen.
• Clone Jesse Anderson's NFL project. Open up a terminal window and enter the following 2 commands:
[cloudera@localhost ~]$ cd workspace/
[cloudera@localhost workspace]$ git clone https://github.com/eljefe6a/nfldata
Bring your laptop with your setup workspace to class! Ryan will show up early to answer questions.

Hands-on Hadoop with MapReduce, Hive and Impala