Large Scale Machine Learning Workshop #3: Hive and Mahout

Name: Large Scale Machine Learning Workshop #3: Hive and Mahout
Start: 2016-03-02T18:30:00-06:00
End: 2016-03-02T20:45:00-06:00
Location: AWS Austin Office

Hosted by Omar O.

Austin ACM SIGKDD - Austin's Big Data Machine Learning Group

Details

Hello all! After getting good feedback from workshop #2, we’ll be focusing on hands-on tasks in workshop #3. We’ll be using Hive queries to create a table from a large-scale dataset. Once we have the data in a convenient form, we’ll be using Mahout to create a machine learning model.

We’re also fortunate to have Cloudera attending to give a presentation on Cloudera Director. This is the tool you’ll be using to bring up clusters on AWS machines, and is a follow-on from the Cloudera Manager presentation in workshop #2.

PLEASE ARRIVE AT THE MEETUP WITH YOUR CLUSTER RUNNING. The pre-work has all the steps you need to spin up your cluster. The cluster only costs $1 per hour, so you can leave this running before the workshop.

Agenda

Cloudera will present details of Cloudera Director.
Omar will give a brief presentation to the tools we’re using in the meetup: Hive and Mahout.
Jaya will give a hands-on tutorial on using Hive for ETL of data.
Omar will give a hands-on tutorial on Mahout to train a model on the Hive tables.
Finally, we’ll have time for questions and wrap up the session.

Pre-work

The pre-work for this week can be found in the Week3 folder in the Hadoop Dropbox folder http://bit.ly/acmawshadoop . In the pre-work you’ll be spinning up a cluster using Cloudera Director, and checking Cloudera Manager. The pre-work takes 90 minutes, but don’t be scared! Only about 20 minutes of hands-on work is needed, the other 70 minutes is spent downloading and installing software on the cluster.

Getting there

The logistics for the meetup can be found here: http://bit.ly/acmawshadoopinfo . This includes directions to the AWS office, and background information.

Any other questions?

If you have any other questions, please leave a comment below.

Austin ACM SIGKDD - Austin's Big Data Machine Learning Group

Large Scale Machine Learning Workshop #3: Hive and Mahout

Austin ACM SIGKDD - Austin's Big Data Machine Learning Group

Details

Related topics

You may also like