addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Machine Learning on BigData w. Map Reduce

Course objectives:
Participants will learn to adapt and execute machine learning algorithms in the map reduce framework.  Participants should finish the class able to author their own machine learning algorithms for map reduce and to run them on Amazon Web Services.

Participants will learn to use python code to author mappers and reducers for “hadoop-streaming”.  For most of the class we will employ “mrjob” - an open-source framework developed at Yelp.  Employing mrjob enables class members to program mappers and reducers in python.  The mrjob framework then submits the mapper-reducer to run locally without using hadoop, to run on Amazon Web Services, or to run them on a private hadoop cluster.  This will simplify the programming tasks.


Registration covers the cost of all 5 sessions.  If you register at least 5 days before the class, the price is $300.  You can register using credit card at  If you register in the last 5 days, the price is $400.  You register on eventbrite or you can pay by check or cash at the first class meeting.



The class will be delivered by webcast - usually several people want to attend the class remotely.  In order to take the class be webcast, you'll need to register on at least 24 hours before class starts.


Here's a schedule to give an idea of what we intend to cover.  We can modify the schedule to match class interests - replace one of the algorithms with another or cover more algorithms at less depth etc.  We'll discuss the topics at the first class meeting.


Week/Date Topic

Week 1 Implementing Algorithms on Big Data - MrJob installation
MapReduce, Hadoop Streaming, Mahout, Amazon (AWS, EMR)

Week 2 Clustering
k-means, Canopy Clustering

Week 3 Supervised Learning
EM algo for mixture model, using canopy for speedup

Week 4 Other ML Tasks
Regularized Regression - glmnet algo for elasticnet    
SVM - Pegasos algo for two-class and one-class, extensions
Recommender Engine - Matrix Factorization by Gradient Descent

Week 5 Student Projects       

Other topics Decision Trees - Google PLANET, Text Mining, Ensemble Methods

-Facility with undergrad level math and stats (vector calculus, density functions, etc.)
-Comfortable programming  basic python (version 2.6 or 2.7 NOT version 3).

-You'll also need to develop some familiarity with Numpy - ("random" family of functions, matrix(), array())
-Install mrjob and boto (these are both python installations)
-Familiarity with basic machine learning.

Join or login to comment.

  • Mike B.

    We're not accepting new participants to this class. We've been going for a couple of weeks and there's too much background to reasonably expect to fill in. Watch this space for the next version of the class. Or send me an email if you'd like to be added to the email list for class announcements.

    2 · January 28, 2013

16 went

Your organizer's refund policy for Machine Learning on BigData w. Map Reduce

Refunds are not offered for this Meetup.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy