addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Random Forest Implementation in MapReduce

In this talk, we'll describe some of the technical details of Alpine Forest (a Random Forest implementation). In particular, we'll focus on the challenge of implementing machine learning algorithms like Alpine Forest on a MapReduce framework, particularly with regard to the high-cost of MapReduce iterations and the limitations of memory availability on nodes.

We'll also go over a couple of different approaches of implementing decision tree learners (that form the bases of Alpine Forest) on a distributed system like MapReduce. We'll conclude by going over some details of Spark and how they can help with the next version of Alpine Forest.

Join or login to comment.

  • Joanne

    Will this be recorded?

    February 12, 2014

  • Pinar D.

    I am sad that I missed this talk. I wonder if there is any recording available? Could you provide a link, if so? Thanks.

    February 27, 2014

  • Ed M

    Any bike parking available?

    February 20, 2014

  • Sung H C.

    Hello everyone, here're "minor" prerequisites (good to know):

    Machine Learning Terminology:
    supervised learning,classification, regression, feature, label, etc.,overfitting, variance/bias

    Entropy (under Information Theory)

    Basics of MapReduce

    1 · February 20, 2014

  • Joanne

    I wonder how technical is the talk? or teach people how to use the Alpine?

    2 · February 19, 2014

    • DB T.

      It will be technical talk. Sung will start from talking about the building block of random forest, decision tree, and how to learn the tree. Then gradually move to the ensemble tree learning method, bagging, boosting, and feature randomization.

      2 · February 20, 2014

    • DB T.

      Finally, he will talk about the implementation in MapReduce, and limitation of MR. In the end, we'll discuss about how to do the parallel tree learning in Spark, and build Random Forest based on this. Of course, he will discuss about why using Spark is better than MR for parallel tree learning.

      2 · February 20, 2014

  • Sara A.

    Hi everyone,

    Looking forward to seeing you tomorrow night. Please be warned - the doors to the building are locked at 7pm, so please show up before then!


    February 19, 2014

  • Shengbo

    Could I ask what programming language will be used for this event? Is it Java?

    February 12, 2014

    • Sung H C.

      It won't be focused on a particular programming language. But we might show code snippets of Java and Scala.

      1 · February 12, 2014

  • A former member
    A former member

    Experienced in BI

    February 3, 2014

  • Nihat H.


    January 30, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy