January 2016 "Official Meetup"


Details
Agenda
• 6:30 Pizza & networking
• 7:00 Announcements
• 7:05 Nicole White & Oliver Keyes - Python String Methods in R (Lightning Talk)
• 7:25 Indrajit Roy and Michael Lawrence - Unifying distributed computing in R, Distributed R, and SparkR
• 8:00 Michael Lawrence - Interfacing R with the Solr document database (Lightning Talk)
• 8:15 Jared Lander - Reducing Uncertainty with Bayesian Regression
-----------------------------------------------------------------------
Python String Methods in R
Nicole White
The pystr package (https://mran.revolutionanalytics.com/package/pystr/) provides string operations the Python way
------------------------------------------------------------------------
Unifying distributed computing in R, Distributed R, and SparkR
Indrajit Roy (Hewlett Packard Labs) and Michael Lawrence (Genentech)
There are many interfaces between R and distributed computing systems. These interfaces tend to be custom, non-standard, and difficult to learn. Lack of standardization has resulted in redundant effort in learning these APIs as well as in implementing distributed applications in R.
We have recently released a package called ddR (https://mran.revolutionanalytics.com/package/ddR/)(Distributed Data structures in R) on CRAN. It declares a unified API for distributed computing in R and ensures that R programs written using the API are portable across different systems, such as Distributed R, Spark, etc.
ddR defines distributed analogs of three central R data structures: data.frame, list and array. The user executes parallel computations through functional iteration, i.e., apply functions. We preserve consistency with base R data structures and functions, so as to provide a simple path for users to port computations to distributed systems. For convenience, and as a proof of concept, we have also released ddR implementations of several canonical machine learning algorithms.
--

Sponsors
January 2016 "Official Meetup"