Multinomial Logistic Regression with Apache Spark

Hosted by Silicon Valley Machine Learning

Public group
Silicon Valley Machine Learning
Silicon Valley Machine Learning
Public group

Hacker Dojo

599 Fairchild Drive · Mountain View, CA

How to find us

We will be meeting at Hacker Dojo in the large event conference room and will start at 6:45 sharp. You do not need to be a member of Hacker Dojo to attend. Please check Peet's next to Hacker Dojo if you can not find parking.

Location image of event venue


Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.

Bio: DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.