"Official" April 2014 Meetup
Details
Agenda
6:30 - Pizza and networking
7:00 - Announcements
7:05 - Talks begin
Each speaker will have 12 minutes with 3 minutes for Q&A
Speakers:
- SriSatish Ambati and Anqi Fu: Scalable in-memory ddply() , randomForest, gbm with library(h2o) on Hadoop
- Gaston Sanchez: Creating Arc Diagrams with R
- Winston Chen: Data Analysis with RStudio and MongoDB
- Raman Kapur: Managing Enterprise Cyber Risk by Leveraging Big Data Analytics
- Ram Narasimhan: The weatherData package
- Sara Brumbaugh: Running R from Excel through VBA
- Giovanni Seni: The REgo package for Rule Ensembles
Details
Scalable in-memory ddply() , randomForest, gbm with library(h2o) on Hadoop
This lightning talk highlights an easy way to run R on Hadoop with H2O. Users write regular single threaded ddply code, 'magic' happens and it runs parallel & distributed on multiple machines. With or without hadoop. A short demo & architecture of the efficient compressed Distributed Frames and fast execution framework.
SriSatish Ambati
Sri is co-founder and ceo of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale.
Anqi Fu
Anqi is a data hacker at 0xdata. And works on bringing seamless R experience on Big Data. Anqi earned her master's degree in Economics, and a second master's degree in Statistics at Stanford and Computer Science degree from Maryland. Her interests include machine learning and optimization.
#-------------------------------
Creating Arc Diagrams with R
An arc diagram is another way of representing a two-dimensional graph. The nodes are arranged along a horizontal (or vertical) axis, and the edges between the nodes are displayed as arcs. Inspired by the “Similar Diversity” arc diagram (by Steinweber and Koller), Gaston will describe the process he went through with R in order to emulate a Similar Diversity arc diagram using the movie scripts of the Star Wars original trilogy.
Gaston Sanchez, PhD
Gaston is a statistical programmer working on multivariate methods for analyzing multiblock data, and data visualization approaches with dimension reduction techniques. He is an enthusiast useR and author of several R packages (e.g. `plspm`, `plsdepot`, `tester`). Currently, he is a guest researcher in the Nielsen Group at UC Berkeley.
#---------------------------------
Building up an easy data analysis platform with RStudio server on top of your MongoDB (http://winston.attlin.com/2014/01/building-up-easy-data-analysis-platform.html)
Winston Chen
#---------------------------------
"weatherData" is an R package that can help get weather-related data with timestamps from the Web in easy to use data frames. In this short talk, we will look at some of the enhancements in the newest release of the package. We will also look at a Shiny application that makes use of the data.
Ram Narasimhan is a Bay-Area based operations researcher who works on logistics problems
#----------------------------------
Managing Enterprise Cyber Risk by Leveraging Big Data and Analytics
Current conventional security tools are unable to effectively prevent increasing levels of Cyber Attacks from succeeding.
The missing piece is that Organizations do not have a correlated enterprise wide view of their Cyber Security Risk. We will highlight how Organizations can leverage the power of Big Data and Analytics using R to locate and measure Enterprise Cyber Security Risk, and consequently prevent these Risks in a prioritized manner.
Raman Kapur
#-----------------------------------
Running R from Excel through VBA: Turning your Old Scripts into Interactive Tools
R scripting can combine with Excel's macro language to pass inputs and parameters from a worksheet to a batch process, and back again. Storing the script in a hidden worksheet makes the work even easier. Examples are given, and addition of an Excel custom menu is demonstrated.
Sara Brumbaugh
#--------------------------------------
REgo
an open-source contribution that "provides a command-line batch interface to the RuleFit statistical model building program. RuleFit refers to Professor Jerome Friedman's implementation of Rule Ensembles.”
Giovanni Seni


