Reproducible computation at scale with drake

R-Ladies Cambridge
R-Ladies Cambridge
Public group

Online event

This event has passed


The techniques in this tutorial enhance the maintainability, hygiene, speed, scale, and reproducibility of complicated projects with long runtimes. The drake R package resolves the dependency structure of data analysis pipelines, skips tasks that are already up to date, and cleanly organizes the output. Workflows with drake are efficient to maintain, and they provide tangible evidence that the results are synchronized with the underlying code and data, which increases one’s ability to trust the conclusions of the research.

Learning objectives:
1. Function-oriented programming: participants will learn to express data analysis tasks as user-defined
functions, experience the advantages of functions relative to imperative scripts, and understand the role
of functions in drake-powered projects.
2. Declarative workflows: students will declare targets to represent data analysis artifacts, identify the
dependencies of those targets, and inspect the emergent dependency structures of entire workflows.
3. Project maintenance: students will experience drake’s responses to changing code and data, and they
will understand the conditions that allow drake to skip computation and save time.
4. Large plans: participants will practice declaring large collections of targets compactly using drake’s
domain-specific language.

Computing requirements:
Each participant must have a laptop with a web browser and the ability to connect to the internet. Students will spend most of their time
working through R notebooks and supporting Shiny applications.
• notebooks:
• learndrakeflow app:
• learndrakeplans app:
• drakeplanner app:
Online RStudio Server instances will be provided so participants can interact with the notebooks without
needing to install anything locally.

The instructor:
Will Landau received his PhD in Statistics at Iowa State University in 2016. His dissertation research
introduced a novel fully Bayesian, hierarchical model-driven, GPU-accelerated approach to the analysis of heterosis gene expression data (Landau, Niemi, and Nettleton 2019). He currently works at Eli Lilly and
Company, where he develops capabilities for clinical statisticians.
Will is the creator and maintainer of rOpenSci’s drake R package. His most recent presentations on drake are the rOpenSci Community Call on September 24, 2019 (video recording at[masked]) (Landau and Butland 2019) and a half-day workshop at R/Pharma 2019 (Landau 2019).