Please Note The Change Of Venue
This event is a joint event with the Dublin Data Visualization Group and the CeADAR Data Analytics Group.
RSVPs may also be made by email to [masked]
Please confirm your presence on arrival by signing in. Any unclaimed seats will be reallocated as 19:05.
The talk is expected to start at 19:10
Hadley Wickham is an Assistant Professor and the Dobelman Family Junior Chair in Statistics at Rice University.
He is an active member of the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation.
His research focusses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualisation to better understand data and models.
R has a notorious reputation for not being able to deal with "big"
data (and ggplot2 and plyr are frequent culprits). Fortunately, this
isn't an underlying problem with R, and it's something that we can fix
with good programming practices and intelligent use of compiled code.
In this talk, I'll introduce two new packages, bigvis and dplyr, that
aim to make it easier (and faster) to work with much larger datasets.
Bigvis makes it possible to visualise[masked] million observations in
just a few seconds. It is built around a pipeline of group, summarise,
smooth and visualise, and makes minimal sacrifices of flexibility to
achieve fast performance. As well as discussing the visualisation
challenges when you have 10s of millions of observations, I'll also
discuss the performance challenges, and how C++ and Rcpp make it
pleasurable to integrate compiled code into R.
Dplyr is an iteration of plyr that focusses on the tools people use
most frequently (ddply, dlply and ldply), speed and on flexible data
stores, so that you can use the same code regardless of whether you
data is in a data frame, data table, or data base. I'll talk a little
about how easy it is to compile simple R expressions into SQL, and on
integrating R into a workflow when your complete dataset can't fit
into memory, or even on the hard drive of a single machine.