Past Meetup

"Machine Learning in Presto" And "Non-Parametric Bayesian"

This Meetup is past

153 people went

Location image of event venue


We are happy to announce a double-talk meetup with Christopher Berner of Facebook and Mohitdeep Singh of Rdio.

This is a joint meetup with SF Machine Learning, so please only register in one place either here or at SF Machine Learning.

Talk 1: Machine Learning in Presto

Presto is an open source distributed SQL query engine used by Facebook, in our Hadoop warehouse. It's typically about 10x faster than Hive, and can be extended to a number of other use cases. One of these extensions adds SQL functions to create and make predictions with machine learning models. The aim of this is to significantly reduce the time it takes to prototype a model, by moving the construction and testing of the model to the database.


Christopher Berner works as a software engineer at Facebook on the Presto team. He wrote the ML functionality, and has worked on the query planner, type system, bytecode generator, and many other pieces of Presto. Before Presto he worked on the newsfeed ranking team developing machine learning models.

Talk 2: Non-Parametric Bayesian: Feature Engineering is machine learning

With the advent of big data, collecting huge amount of unstructured data is pretty much a standard routine. Since, not much thought is given apriori on data, most of the times, this data is unlabelled. Fortunately, application of unsupervised learning techniques allows us to understand and explore something about the structure of our data. But common techniques like kmeans or EM approaches doesn't work well when you are dealing with high dimensional feature set(say text data). Dimensionality reduction techniques are applicable but then one has to work in transformed feature space where you don't know which dimension is what (loss of interpretation). Also, such parametric approaches suffer from the assumptions built in them (for example: you have to fix the number of clusters apriori). In this talk, we will talk about non-parametric bayesian approaches ((in particular) Chinese Restaurant Process metaphor) which allows you to learn infinite number of clusters. We will also demonstrate the results of such approaches in the text data.

Bio: Mohitdeep Singh works at Rdio as a data scientist. He is also a collaborator at Lawerence Berkeley Lab where he is working on scaling randomized linear algebra techniques on spark/edison. His research interests are large scale machine learning, bayesian inferences, randomized techniques in linear algebra and probabilistic graphical models. He did his first masters from Carnegie Mellon and is currently pursuing his second masters (part-time) from Georgia Tech.(


6pm - 6:30 pm social

6:30 pm -- 6:35 pm introduction

6:35 pm -- 7:15 pm Chris talk

7:15 pm -- 7:20 pm Q & A for Chris

7:20 pm -- 8:00 pm Mohit talk

8:00 pm -- 8:15 pm Q & A for both Chris and Mohit

8: 30 pm ends