Advanced Hadoop Based Machine Learning

Hosted by Austin ACM SIGKDD - Austin's Big Data Machine Learning Group

Public group

This is a past event

18 people went

Location image of event venue


Course is limited to first 42 who signup each week.

Austin ACM SIGKDD Advanced Hadoop Based Machine Learning

Austin ACM SIGKDD is offering a two-semester course on Hadoop Based Machine Learning. Participants in the course will receive an official ACM certificate for completion of the course. A separate certificate, Hadoop Based Machine Learning for the fall, and Advanced Hadoop Based Machine Learning for the spring, will be offered for each semester. You do not have to be a member of ACM or SIGKDD to take the course. There is no cost for the course. The fall course is now closed to those who attended the first four meetings for the fall semester.

The course will meet every Wednesday evening from 7:00 pm – 8:30 pm at Paypal for the fall and spring semesters. The specific dates are below. The location is Paypal, 7700 W. Palmer Lane, Austin Texas, 78717, Building D, Conference Room, Bring a picture ID to get into the building.

The course will cover Hadoop based machine learning with a three-prong approach. One part of the course will be taught from the book “Data-Intensive Text Processing with MapReduce” by Jimmy Lin and Chris Dyer. The cloud9 map-reduce library written by Jimmy Lin for the book will also be reviewed. The second prong is once a month a session will be devoted to a machine learning techniques implemented in Mahout using map-reduce. The last prong will be bi-monthly reviews of the latest research papers on machine learning techniques using map-reduce.

Prerequisites: The course will cover the mathematics of machine learning. Understanding of linear algebra, probability, statistics, and optimization will be useful. All the coding examples will be in Java.

Required Text: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer. The book is available for free at the below URL.

Recommend Text: Hadoop: The Definitive Guide by Tom White

Grading: Attendance at 70% of the sessions each semester. End of the semester exam, 20 questions, multiple choice, take home exam.

Fall Semester

Session, Date, Source, Chapters, Topic

1, 09/04/2013, Book, Ch. 1 & 2, Map Reduce Basics

2, 09/11/2013, Book, Ch 1 & 2, Map Reduce Basics

3, 09/25/2013, Book, 3.1, 3.2, MR Algorithm Design - Aggregation

4, 10/02/2013, Mahout, Mahout math and collections

5, 10/09/2013, Book, 3.3, 3.4, MR Algorithm Design – Counting & Sorting

6, 10/16/2013, Book, 3.5, 3.6, MR Algorithm Design - Joins

7, 10/23/2013, Papers, QR Factorization

8, 10/30/2013, Mahout, Classifier Naive Bayes

9, 11/06/2013, Book, 4.1-7, Inverted Indexing

10, 11/13/2013, Slides, Singular Value Decomposition

11, 11/20/2013, Video, Singular Value Decomposition

12, 12/04/2013, Papers, Singular Value Decomposition

13, 12/11/2013, Mahout, Singular Value Decomposition

14, 12/18/2013, Slides, Latent Semantic Indexing

Spring Semester

Session, Date, Source, Chapters, Topic

1, 01/15/2014, Book, 5.1, Graphs

2, 01/22/2014, Book, 5.2, Graphs – Parallel Breath-First Search

3, 01/29/2014, Book, 5.3, Graphs – Page Rank

4, 02/05/2014, Book, 5.4, 5.4, Graphs - Issues

5, 02/12/2014, Book, 6.1, Expectation Maximization

6, 02/19/2014, Mahout, Clustering – Spectral Clustering

7, 02/26/2014, Book, 6.2, Hidden Markov Models

8, 03/05/2014, Book, 6.3, EM in MapReduce

9, 03/12/2014, Papers, Decision Trees

10, 03/19/2014, Mahout, Decision Trees - Random Forest

11, 03/26/2014, Book, 6.4, Case Study

12, 04/02/2014, Book, 6.5, 6.6, EM Like Algorithms

13, 04/09/2014, Book, Ch. 7, Closing Remarks

14, 04/16/2014, Mahout, Hidden Markov Models

15, 04/23/2014, Mahout, Clustering - Canopy Clustering

16, 04/30/2014, Papers, Bag of Little Bootstraps

17, 05/07/2014, Papers, Stochastic Subgradient Optimization