February 2016 Lightning Talks!


Details
This will be an all lightning talk meetup! Each talk will be 12 minutes with 3 minutes for questions and hand off.
Agenda
6:30PM - Pizza and networking
6:55 - Announcements
7:00 - Bradley Shanrock-Solberg: Monte Carlo Methods for Decision Support UsingR
7:15 - Matt Dowle: Parallel and Distributed Joining
7:30 - William Sundstrom: Teaching Undergrad Econometrics with R
7:45 - Hossein Falaki: Exploring Large Data Sets with Apache Spark and R
8:00 - Keith Everett's Oscar Predictions
8:15 - Dennis Noren: Who else acts like this?
8:30 - David Ouyang: An Epic Use of Time
8:45 - Nelson Auner: Operationalizing R at Affirm
#---------------------------------------
Bradley Shanrock-Solberg
Monte Carlo Methods for Decision Support UsingR
How to use R to package a model with random elements or data into a clean Monte Carlo simulation, using the "wildpoker" package and related Shiny application to illustrate the points
#---------------------------------------
Matt Dowle
Parallel and Distributed Joining
Matt has taken data.table's radix join and parallelized and distributed it in H2O. He will describe how the algorithm works and provide some benchmarks. H2O is open source on GitHub and is accessible from R and Python using the h2o package on CRAN and PyPI.
#----------------------------------------
William Sundstrom
Teaching Undergrad Econometrics with R
Modern training in economics must include not only learning abstract theories and models but also conducting hands-on data analysis to establish causal relationships. At Santa Clara University we have implemented a required econometrics course for all our economics majors. With a dedicated lab section, R tutorials and exercises aligned with the statistical content, and judicious use of packages, we have been able to get our students coding, running, and interpreting fairly sophisticated regression analyses using R in a 10-week term.
#----------------------------------------
Hossein Falaki
Exploring Large Data Sets with Apache Spark and R
In this meetup I will introduce SparkR and how it integrates the two worlds of Spark and R. I will demonstrate one of the most important use cases of SparkR: exploratory analysis of very large data. Specifically, I will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis example.
#------------------------------------------
Keith Everett
Predicting the Oscars Using the glmnet Package
#---------------------------------------------
Dennis Noren
Who else acts like this?
An existing smart phone app, 'Marquee', needed a recommender: "given an actor/actress, which others are similar?". This uses the open source TMDb movie/actor database and its API. No user-click history was available, so a method based on R package 'TMDb', feature sets, and similarity was developed.
#-------------------------------------------
David Ouyang
An Epic Use of Time
Outcomes Research using the Electronic Medical Record System
With the adoption of electronic medical records, there has been exponential increase in data about medical care. There is little information on how physician staffing affects patient care, and in our analysis, we correlate physician computer usage logs to patient outcomes.
#---------------------------------------------

Sponsors
February 2016 Lightning Talks!