"Official"December 2016 BARUG Meeting

Name: "Official"December 2016 BARUG Meeting
Start: 2016-12-13T18:30:00-08:00
End: 2016-12-13T21:30:00-08:00
Location: Instacart

Hosted By

Joseph R.

Details

Agenda:
6:30 - Food and networking
7:00 - Announcements
7:05 - Nicole White: Codenames: Playing Spymaster with R
7:25 - John Mount: Cleaning real world data in R using the vtreat package
8:00 - Norm Matloff: recsys: an Advanced Tool for Recommender Systems

Note: Due to illness Jeremy Stanley's talk has been canceled.

#------------------
Nicole White

Codenames: Playing Spymaster with R

In Codenames (https://boardgamegeek.com/boardgame/178900/codenames), a popular party game, two teams compete to identify all of their words (or codenames) on a grid of 25 words. One player on each team (called the spymaster) is tasked with giving one-word clues to their teammates to help them identify their words. In this presentation, I'll talk about using R to automate the spymaster's task. Each codename on the board is treated as a document and machine learning techniques are used to find similarities among the codenames, cluster them, and determine the best one-word clue for each cluster. See my blog post (https://nicolewhite.github.io/2016/07/19/spymaster.html) for more details.

#------------------

John Mount

Cleaning real world data in R using the vtreat (https://cran.r-project.org/web/packages/vtreat/index.html) package

I’ll share some typical examples of analysis killing real-world data issues and show how to quickly and correctly prepare data for predictive modeling using the R package vtreat. vtreat is an R data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. I will work through preparing string-valued variables for analysis, missing values, and new values appearing after model deployment. I will also discuss some potential pitfalls of data preparation (nested model bias) and how we avoid them. Users have said vtreat has saved their projects and data science careers. I will show why vtreat should become a key part of your predictive modeling workflow.

#------------

Norm Matloff

"recsys: an Advanced Tool for Recommender Systems"

The notion of collaborative filtering for recommender systems will be introduced, and several methods will be discussed, some existing and some novel. Our R package 'rectools' implementing these methods will be introduced, and examples given. Applications include both the "traditional," i.e. marketing, and the innovative, such as medical.

Events in San Francisco, CA