Skip to content

January 2015 "Official" Meetup

Photo of Joseph Rickert
Hosted By
Joseph R.
January 2015 "Official" Meetup

Details

Agenda:
6:30PM - Pizza and networking
7:00 - Announcements
7:05 - Ryan Hafen: Tessera
7:30 - Hadley Wickham: Pure, predictable, pipeable:
creating fluent interfaces with R
8:15 - Nick Elprin: Effective R parallelization

----------------------------------------

Hadley Wickham - Abstract

Pure, predictable, pipeable: creating fluent interfaces with R.

A fluent interface lets you easily express yourself in code. Over time a fluent interface retreats to your subconcious. You don't need to bring it to mind; the code just flows out of your fingers. I strive for this fluency in all the packages I write, and while I don't always succeed, I think I've learned some valuable lessons along the way.

In this talk, I'll discuss three guidelines that make it easier to develop fluent interfaces:

  • Pure functions. A pure function only interacts with the world through its inputs and outputs; it has no side-effects. Pure functions make great building blocks because they're are easy to reason about and can be easily composed.

  • Predictable interfaces. It's easier to learn a function if its consistent, because you can learn the behaviour of a whole group of functions at once. I'll highlight the benefits of predictability with some of my favourite R "WAT"s (including `c()`, `sapply()` and
    `sample()`).

  • Pipes. Pure predictable functions are nice in isolation but are most powerful in combination. The pipe, `%>%`, is particularly in important when combining many functions because it turns function composition on its head so you can read it from left-to-right. I'll
    show you how this has helped me build dplyr, rvest, ggvis, lowliner, stringr and more.

This talk will help you make best use of my recent packages, and teach you how to apply the same principles to make your own code easier to use.

----------------------------------------------
Ryan Hafen - Abstract

Tessera is a statistical computing environment that enables deep analysis of large, complex data.

Tessera is powered by Divide and Recombine (D&R), an approach for dividing data into meaningful subsets, computing on them in an embarrassingly parallel manner, and combining the results in a way that provides a statistically valid result. At the front end of Tessera, the analyst programs in R. At the back end is a distributed parallel computation environment such as Hadoop. The environment is designed to provide the thousands of statistical, machine learning, and visualization methods available in R and hide all of the details of distributed computing. It is also designed to be back end agnostic, so that new distributed technologies can be plugged in.
In this talk, I will introduce D&R and Tessera, covering topics in statistical methodology, computation, and visualization related to the environment, as well as research challenges. More information about Tessera can be found on tessera.io (http://tessera.io/).

Ryan Hafen is an independent statistical consultant. His research focuses on methodology, tools, and applications in exploratory analysis, statistical model building, and machine learning on large, complex datasets. He is the lead developer of the datadr and Trelliscope components of the Tessera environment.

-------------------------------------------

Nick Elprin - Abstract

Nick will demo some effective techniques for using R with “medium data” problems that are too large for your laptop but not big enough to demand a heavyweight deployment like Hadoop. More specifically, he’ll show how to use multi-core, high-memory machines (easily accessible through EC2 or similar services) along with several R packages for speed up your analyses through parallelization. Demos will include general parallelization tools like the Parallel library and foreach package, as well as some techniques specific to machine-learning tasks.

Photo of Bay Area useR Group (R Programming Language) group
Bay Area useR Group (R Programming Language)
See more events