addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Class Brochure

Learning “Machine Learning” by Example

A quick overview
This is a 5+5+5 weeks machine learning course. There are several online machine learning courses offered right now, Coursera, Udacity, Caltech, Stanford. This course is based on the “learning by example” principle. The students will be introduced in machine learning algorithms and data analysis techniques by actively working on a real dataset. In the first 6 weeks we will work on the airlines dataset. We will apply standard machine learning methods for clustering, classification, regression and dimensionality reduction. In the next 5 weeks we will work on the KDD cup 2012 dataset and implement the 3 winning strategies. The last 5 weeks we will be ready to work on a real Kaggle competition. During the last part of the course we will look at more advanced machine learning techniques.

Course style
In this course we will not follow the traditional lecturing style with powerpoint style. It will be taught in lab sessions. We will have a 2 hour session every week. From the first session we will dive into the dataset and start to get familiar with it. Each session will have an assignment. The instructor will give to the students the minimum information they need to start working on the problem in the first 10 minutes of the class. As the students progress and finish the subtasks, we will have 10 minute breaks where we will discuss the theory and the algorithms behind the task. Literature references will be given for further reading. The advantage of this method is that each student will have the flexibility to control the depth of understanding the underlying theory. As the course progresses from simple algorithms to more advanced in dataset competitions, the students will be divided in groups taking different approaches for solving the problem. Students should feel free to share their experience in the course wiki.

Tools, hardware
The language platform for the course will be R. The nice thing about R is that there are multiple platforms to run it. In the course we will use the following:
Running programs locally on your laptop. In the beginning we will focus mainly in single thread implementation of the algorithms

  • RHadoop This distributed version of R on Hadoop will help us scale for real data size problems
  • RevoScaleR We will see if Revolution Analytics can provide us with a licence for the scope of the course
  • RHipe This is another distributed version of R on Hadoop.
  • rHPCC A distributed version of R on HPCCSystems ECL
    We will mainly focus on RHadoop/RHipe. The nice thing about R is that you can code once and then run it everywhere (sort of). In case some students want to focus more on performance and less on the data analysis they can code and test a limited suite of algorithms in all the platforms, including erlang and GraphLab.

    Starting date: Friday November 9th
    Place: LogicBlox Inc, 1349 West Peachtree st, 18th floor
    time: 4:00pm to 6:00pm


    Week 1 (11/9/2012):
    Install R,
    Load the Airline dataset
    Visualize your dataset with PCA and MDS
    Select some attributes, sample and run your first linear regression
    A 10 minute description of regression

    Week 2 (11/16/2012):
    Run your first decision tree
    10 minute presentation of decision trees
    Run your first support vector machine
    10 minute presentation of support vector machine

    Week 3 (11/30/2012):
    Logistic Regression
    Run logistic regression on subsamples of the data
    10 minute presentation of logistic regression
    Logistic Regression versus Linear Regression

    Week 4 (12/7/2012):
    Run your first k-means clustering
    10 minute presentation of k-means
    Make a distributed K-means (optional)
    Finding Outliers.

    Week 5 (12/14/2012):
    Predicting the delay of a flight
    Building a hierarchical regression model
    Combine clustering/classification with Regression
    Use a distributed regression
    Tune and test your system



    Week 6 :
    Presentation of the first track of the competition
    Presentation of 3rd winner solution “Social Network and Click-through Prediction with Factorization Machines”
    Introduction to Stochastic Gradient Descent

    Week 7 :
    Building a simple collaborative filtering recommendation engine
    Matrix Factorization
    Singular Value Decomposition

    Week 8 :
    NonNegative Matrix Factorization

    Week 9 :
    Implementation of FM algorithm
    Distributed version of FM (optional)

    Week 10 :
    Application of the FM algorithm on the KDD cup dataset

    Frequently Asked Questions
    Which day of the week will the course take place? Every Friday and Thursday 4pm to 6pm
    When does the course begin? November 7th
    Where do you meet every week? The course will take place in the LogicBlox office
    Can people participate remotely? We are thinking seriously about that
    Does the course include homeworks? Not in the way you mean it in college. You are not expected though to finish the lab assignments in the 2 hour sessions. You are expected to spend at least another 2 hours or more depending on how deeply you want to go into machine learning
    Is there a limit for the number of people attending the class? Unfortunately we cannot host more than 15 people. We can probably accommodate about 20 more remote students
    Will you provide the hardware for the course? You will have to bring your own laptop. We will try to get a cloud cluster for running the large scale jobs
    Do I have to pay? No the course is for free

Table of Contents

Page title Most recent update Last edited by
ICML 2013 Review August 2, 2013 4:15 PM nikolaos v.
Lesson 8 April 10, 2013 1:57 PM nikolaos v.
Lesson 7 April 3, 2013 11:44 AM nikolaos v.
Other clustering December 6, 2012 3:33 PM nikolaos v.
Distributed k-means December 5, 2012 11:23 PM nikolaos v.
Introduction to k-means December 5, 2012 11:09 PM nikolaos v.
Download a virtual machine November 28, 2012 9:48 AM nikolaos v.
Lesson 3 December 6, 2012 4:21 PM nikolaos v.
Decision Tree November 16, 2012 3:21 PM nikolaos v.
Regression Tree November 16, 2012 3:10 PM nikolaos v.
Lesson 2 Run a big logistic regression November 16, 2012 2:33 PM nikolaos v.
Lesson 2 Logistic Regression November 16, 2012 2:23 PM nikolaos v.

Our Sponsors

  • Ismion Inc

    The instructor for teaching the courses

  • LogicBlox Inc

    LogicBlox offers space, equipment and instructors payment

  • Predictix

    Paying for cloud time and for TAs

  • Kabbage

    Space and great pizza

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy