Past Meetup

January 2016 "Official Meetup"

This Meetup is past

189 people went

Location image of event venue



• 6:30 Pizza & networking

• 7:00 Announcements

• 7:05 Nicole White & Oliver Keyes - Python String Methods in R (Lightning Talk)

• 7:25 Indrajit Roy and Michael Lawrence - Unifying distributed computing in R, Distributed R, and SparkR

• 8:00 Michael Lawrence - Interfacing R with the Solr document database (Lightning Talk)

• 8:15 Jared Lander - Reducing Uncertainty with Bayesian Regression


Python String Methods in R

Nicole White

The pystr package ( provides string operations the Python way


Unifying distributed computing in R, Distributed R, and SparkR

Indrajit Roy (Hewlett Packard Labs) and Michael Lawrence (Genentech)

There are many interfaces between R and distributed computing systems. These interfaces tend to be custom, non-standard, and difficult to learn. Lack of standardization has resulted in redundant effort in learning these APIs as well as in implementing distributed applications in R.

We have recently released a package called ddR ( Data structures in R) on CRAN. It declares a unified API for distributed computing in R and ensures that R programs written using the API are portable across different systems, such as Distributed R, Spark, etc.

ddR defines distributed analogs of three central R data structures: data.frame, list and array. The user executes parallel computations through functional iteration, i.e., apply functions. We preserve consistency with base R data structures and functions, so as to provide a simple path for users to port computations to distributed systems. For convenience, and as a proof of concept, we have also released ddR implementations of several canonical machine learning algorithms.