addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrosseditemptyheartexportfacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Large-Scale Machine Learning with Apache Spark

We'll have a series of events talking about machine learning in Spark. 

It's our pleasure to have Xiangrui Meng from Databricks as our first speaker on this series to introduce Spark to data scientists.

For the next meetup on May 1, we will have a join event with Cloudera talking about part2 of Spark, mllib, and large scale multinomial logistic regression implementation in Spark. 

In the future, we'll talk about Random Forest implementation in Spark.

Spark is a new cluster computing engine that is rapidly gaining popularity — with over 150 contributors in the past year, it is one of the most active open source projects in big data, surpassing even Hadoop MapReduce. Spark was designed to both make traditional MapReduce programming easier and to support new types of applications, with one of the earliest focus areas being machine learning. In this talk, we’ll introduce Spark and show how to use it to build fast, end-to-end machine learning workflows. Using Spark’s high-level API, we can process raw data with familiar libraries in Java, Scala or Python (e.g. NumPy) to extract the features for machine learning. Then, using MLlib, its built-in machine learning library, we can run scalable versions of popular algorithms. We’ll also cover upcoming development work including new built-in algorithms and R bindings.

Xiangrui Meng is a software engineer at Databricks. He has been actively involved in the development of Spark MLlib since he joined. Before Databricks, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. His thesis work at Stanford is on randomized algorithms for large-scale linear regression.

Join or login to comment.

  • DB T.

    1 · May 8, 2014

  • Amir Y.

    Would there be a recording of this? Thanks

    1 · April 21, 2014

  • A former member
    A former member

    Awesome session. Learn a lots of from you guys.

    May 2, 2014

  • DB T.

    Thanks again for the venue sponsored by Yelp! The video is available here now.

    April 30, 2014

  • A former member
    A former member

    Pretty cool overview :) I would have appreciated a bit more algorithmic details, though.

    1 · April 24, 2014

    • DB T.

      Come to our next meetup! We will talk more detail in how to parallelize the algorithms.

      2 · April 24, 2014

    • Xiangrui M.

      Yes, this was an intro talk. We will reveal more algorithmic/technical details in the coming talks.

      2 · April 25, 2014

  • Dan M.

    Great meet up! Definitely mind expanding. Looking forward to next one.

    April 25, 2014

  • Spondon S.

    Sorry cant make it tonight. Looking forward to the recording.

    1 · April 24, 2014

  • Jenny M.

    would there be live streaming for this event?

    April 24, 2014

  • DB T.

    Yelp is asking for the list of people tonight for security; please change the RSVP for the people in waiting-list.

    April 24, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy