Skip to content

Large Scale Machine Learning Workshop #3: Hive and Mahout

Photo of Omar Odibat
Hosted By
Omar O.
Large Scale Machine Learning Workshop #3: Hive and Mahout

Details

Hello all! After getting good feedback from workshop #2, we’ll be focusing on hands-on tasks in workshop #3. We’ll be using Hive queries to create a table from a large-scale dataset. Once we have the data in a convenient form, we’ll be using Mahout to create a machine learning model.

We’re also fortunate to have Cloudera attending to give a presentation on Cloudera Director. This is the tool you’ll be using to bring up clusters on AWS machines, and is a follow-on from the Cloudera Manager presentation in workshop #2.

PLEASE ARRIVE AT THE MEETUP WITH YOUR CLUSTER RUNNING. The pre-work has all the steps you need to spin up your cluster. The cluster only costs $1 per hour, so you can leave this running before the workshop.

Agenda

  • Cloudera will present details of Cloudera Director.

  • Omar will give a brief presentation to the tools we’re using in the meetup: Hive and Mahout.

  • Jaya will give a hands-on tutorial on using Hive for ETL of data.

  • Omar will give a hands-on tutorial on Mahout to train a model on the Hive tables.

  • Finally, we’ll have time for questions and wrap up the session.

Pre-work

The pre-work for this week can be found in the Week3 folder in the Hadoop Dropbox folder http://bit.ly/acmawshadoop . In the pre-work you’ll be spinning up a cluster using Cloudera Director, and checking Cloudera Manager. The pre-work takes 90 minutes, but don’t be scared! Only about 20 minutes of hands-on work is needed, the other 70 minutes is spent downloading and installing software on the cluster.

Getting there

The logistics for the meetup can be found here: http://bit.ly/acmawshadoopinfo . This includes directions to the AWS office, and background information.

Any other questions?

If you have any other questions, please leave a comment below.

Photo of Austin ACM SIGKDD - Austin's Big Data Machine Learning Group group
Austin ACM SIGKDD - Austin's Big Data Machine Learning Group
See more events
AWS Austin Office
11501 Alterra Parkway, 2nd Floor, Conf Rooms AUS11 02.202 and 02.201 · Austin, TX