Optimizing Performance of ML Models Through a Bayesian Lens w/ Tripadvisor

Analytics & Data Science by Dataiku NY
Analytics & Data Science by Dataiku NY
Public group

Online event

This event has passed


Thanks for your interest in this Dataiku NYC Meetup! The health & safety of our attendees & speakers is our primary concern. While this currently proves to be a tricky time for public gatherings, Dataiku is still committed to providing great tech content & facilitating discussions in the data science space. As such, we’ve decided to pivot towards online webinars via our partner platform, BrightTalk.


Tentative Schedule: (EST)

7:00pm: Intro
7:05pm: Optimizing Performance of ML Models Through a Bayesian Lens with TripAdvisor
7:45pm: Q&A

Talk Abstract:

Bayesian Imputation of Missing Feature Values in Product Sort & Recommendation at Tripadvisor

Do you encounter missing values in your model features, but don’t give them much thought? I have two goals in this talk: 1) use my work with sort algorithms at Tripadvisor to show how ad-hoc imputation of missing values severely hurts the performance of real-world ML models, and 2) cast the missing value problem as a probabilistic model which one can solve through Bayesian inference. I will end by showing that the most widely used missing value imputation technique in the statistics community (Multiple Imputation by Chained Equations, MICE), which scikit-learn implements in its IterativeImputer) can be better understood as approximate Bayesian inference in a simple probabilistic model.
This talk will have content that should appeal to data and ML related researchers of all skill levels. For beginning data-related practitioners, part 1 of my talk will demonstrate why it is important to think about missing values carefully during feature engineering and how to examine their role in a model’s predictive performance. For more experienced attendees, part 2 of my talk will try to draw a bridge between the statistical literature on missing value imputation and the world of the machine learning practitioner through a Bayesian lens.

Speaker Bio:

Narendra is a long time Bayesian interested in the connections between statistics, causal inference and machine learning. Currently, he is a Machine Learning Scientist at Tripadvisor based at their global headquarters in Needham, MA. His work at Tripadvisor spans the entire range of customer-centric ML problems from recommendation engines to building probabilistic models of user-generated content creation. Before Tripadvisor, Narendra obtained his PhD in systems neuroscience from Brandeis University where he developed probabilistic latent variable models of stimulus coding in the brain. He got into the world of Bayesian machine learning during his PhD, and has been in love with that world ever since! Outside of Bayes’ and ML, he is an avid cyclist and has explored much of north-east US on my bike. To learn more about Narendra, look at his webpage at: https://narendramukherjee.github.io

Disclaimer: All views, thoughts, & opinions expressed in the webinar belong solely to the panelists, & not to the panelists’ employer, organization, committee, other group or individual.