Past Meetup

February 2016 Lightning Talks!

This Meetup is past

136 people went


This will be an all lightning talk meetup! Each talk will be 12 minutes with 3 minutes for questions and hand off.

6:30PM - Pizza and networking
6:55 - Announcements
7:00 - Bradley Shanrock-Solberg: Monte Carlo Methods for Decision Support UsingR
7:15 - Matt Dowle: Parallel and Distributed Joining
7:30 - William Sundstrom: Teaching Undergrad Econometrics with R
7:45 - Hossein Falaki: Exploring Large Data Sets with Apache Spark and R
8:00 - Keith Everett's Oscar Predictions
8:15 - Dennis Noren: Who else acts like this?
8:30 - David Ouyang: An Epic Use of Time
8:45 - Nelson Auner: Operationalizing R at Affirm

Bradley Shanrock-Solberg

Monte Carlo Methods for Decision Support UsingR

How to use R to package a model with random elements or data into a clean Monte Carlo simulation, using the "wildpoker" package and related Shiny application to illustrate the points

Matt Dowle

Parallel and Distributed Joining

Matt has taken data.table's radix join and parallelized and distributed it in H2O. He will describe how the algorithm works and provide some benchmarks. H2O is open source on GitHub and is accessible from R and Python using the h2o package on CRAN and PyPI.

William Sundstrom

Teaching Undergrad Econometrics with R

Modern training in economics must include not only learning abstract theories and models but also conducting hands-on data analysis to establish causal relationships. At Santa Clara University we have implemented a required econometrics course for all our economics majors. With a dedicated lab section, R tutorials and exercises aligned with the statistical content, and judicious use of packages, we have been able to get our students coding, running, and interpreting fairly sophisticated regression analyses using R in a 10-week term.

Hossein Falaki

Exploring Large Data Sets with Apache Spark and R

In this meetup I will introduce SparkR and how it integrates the two worlds of Spark and R. I will demonstrate one of the most important use cases of SparkR: exploratory analysis of very large data. Specifically, I will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis example.

Keith Everett

Predicting the Oscars Using the glmnet Package

Dennis Noren

Who else acts like this?

An existing smart phone app, 'Marquee', needed a recommender: "given an actor/actress, which others are similar?". This uses the open source TMDb movie/actor database and its API. No user-click history was available, so a method based on R package 'TMDb', feature sets, and similarity was developed.

David Ouyang

An Epic Use of Time

Outcomes Research using the Electronic Medical Record System

With the adoption of electronic medical records, there has been exponential increase in data about medical care. There is little information on how physician staffing affects patient care, and in our analysis, we correlate physician computer usage logs to patient outcomes.