50 Beale Street, 11th floor, San Francisco, CA
6:30 - Food and networking
7:00 - Announcements
7:05 - Nicole White: Codenames: Playing Spymaster with R
7:20 - Jeremy Stanley: XGBoost with Quantile Regression for Predicting Variability in Delivery Times
7:50 - John Mount: Cleaning real world data in R using the vtreat package
8:20 - Norm Matloff: recsys: an Advanced Tool for Recommender Systems
Codenames: Playing Spymaster with R
In Codenames, a popular party game, two teams compete to identify all of their words (or codenames) on a grid of 25 words. One player on each team (called the spymaster) is tasked with giving one-word clues to their teammates to help them identify their words. In this presentation, I'll talk about using R to automate the spymaster's task. Each codename on the board is treated as a document and machine learning techniques are used to find similarities among the codenames, cluster them, and determine the best one-word clue for each cluster. See my blog post for more details.
XGBoost with Quantile Regression for Predicting Variability in Delivery Times
At Instacart, we optimize shopper routing to balance the efficiency with which we can fulfill orders with the risk of causing late deliveries. By predicting the quantiles of the expected delivery time for routes in planning, we can estimate the chance a route will result in late deliveries.
In this talk, we will cover:
* Quantile estimation with check loss function
* A smooth approximation that is twice differentiable
* Approximate quantile regression in XGBoost in R with custom objective functions
* Visualizing delivery time variability in maps using ggmaps
Cleaning real world data in R using the vtreat package
I’ll share some typical examples of analysis killing real-world data issues and show how to quickly and correctly prepare data for predictive modeling using the R package vtreat. vtreat is an R data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. I will work through preparing string-valued variables for analysis, missing values, and new values appearing after model deployment. I will also discuss some potential pitfalls of data preparation (nested model bias) and how we avoid them. Users have said vtreat has saved their projects and data science careers. I will show why vtreat should become a key part of your predictive modeling workflow.
"recsys: an Advanced Tool for Recommender Systems"
The notion of collaborative filtering for recommender systems will be introduced, and several methods will be discussed, some existing and some novel. Our R package 'rectools' implementing these methods will be introduced, and examples given. Applications include both the "traditional," i.e. marketing, and the innovative, such as medical.
CTO / pharmacometrician @InsightRX
Financial analytics at Pandora
I'm primarily a common lisp programmer and am getting into mathematics.
Graduate student in applied economics.
Python and R Data Scientist
Statistician/Data Scientist at Opinion Dynamics
Research on life insurance matters.
love R, Love data!
Long time R developer, since 1996
Sr. Data Scientist in Cloud industry
I am currently using R for portfolio optimization.
A PhD student at UCSF
Data Science Fellow. Self trained in R. Available for consulting, possible FTE.
I am an actuary who uses R for actuarial analysis.
Greetings: When's you're next meetup?
Data scientist @ Stitch Fix
Data Science at Infer
CS (and former statistics) professor
I am a mental health services evaluator working with the San Francisco Department of Public Health.
CEO, Clinical Persona Inc.
Finance quant who likes to play with data.
GIS user with FOSS GIS experience and aptitude, oggling at R at a distance since 2007.