Efficient #Rstats Workflows with the ๐ฆ {drake} and {orderly} packages


Details
๐ We welcome 2020 with 2 talks that we are very excited about!
Approximate schedule:
18.30: doors open ๐๐ป๐น
18.45: chitchat, announcements
19.00 - 19.30: ๐ถ TALK 1, Rich FitzJohn - ๐ฆ {orderly}
19.35 - 20.05: ๐ถ TALK 2, Matt Dray - ๐ฆ {drake}
EventBrite tickets link: http://bit.ly/bioinfoLDN_30-01-2020
(FREE - Required for building security)
We start the year with #Rstats and we couldnโt be more enthusiastic about our invited speakers!
For Januaryโs edition of #Bioinformatics London we are revisiting reproducible workflows but this time the discussion will revolve around the R ecosystem.
๐ถ TALK 1: Reproducible reporting and putting research code into production with the {orderly} R package ๐ฆ
Speaker: Rich FitzJohn, PhD - Research Software Engineer at MRC Centre for Global Disease Analysis, Imperial College London (https://twitter.com/rgfitzjohn)
Rich FitzJohn will start his talk outlining some practical reasons why reproducibility remains a challenge for analysis, why existing reproducible research solutions miss important challenges for scientists and analysts, and draw analogies with the emergence of "structured programming".
The second part of the talk will introduce the {orderly} package, which aims to address the concerns raised. Importantly, rather than focusing on how a scientist implements a particular analysis, {orderly} establishes conventions around inputs and outputs of analysis, then borrows ideas from git and docker to carry out the analysis in a somewhat isolated way, with the result that research inputs and outputs are always easily associated.
Rich has been a research software engineer in the โRESIDEโ group ( https://reside-ic.github.io ) at Imperial College for the last 4 years. Before moving to work full-time as an RSE, his research career involved modelling coexistence in tropical forests, diversification over macro-evolutionary timescales and the potential for gene flow from genetically-modified crops. You can read more about the {orderly} R package here: https://vimc.github.io/orderly/articles/orderly.html
๐ถ TALK 2: Reproducible workflows in R with the {drake} R package ๐ฆ
Speaker: Matt Dray, PhD - Data Scientist, Cabinet Office (https://twitter.com/mattdray)
Matt Dray in the first part of his talk will explain what a workflow manager is and why you need one (spoilers: to reduce error and improve reproducibility). The second part will explain why the {drake} package by Will Landau is a great solution (spoilers: it's R-specific, handles distributed computing and has great documentation). 'What gets done stays done!'
Matt has worked with data in government for Cabinet Office, Government Digital Service, Department for Education and Department for Environment Food and Rural Affairs. Before that he got an Ecology PhD at Cardiff University and Entomology MSc at Imperial.
A few words for Will Landau, the creator and maintainer of the {drake} R package:
Will is a Research Scientist at Eli Lilly and Company, where he develops methods and software for statisticians. Prior to joining Lilly, he earned his PhD in Statistics from Iowa State University. You can get started with {drake} by checking out a presentation of the package at the following rOpenSci community call: https://ropensci.org/commcalls/2019-09-24.
Will introduces the challenges of large computation, expounds the virtues of function-oriented programming, and walks through a practical deep learning example with drake. Feel free to get in touch with Will for {drake} by opening an issue at https://github.com/ropensci/drake/issues.
Ticketing is required, the event will still be free but use EventBrite.
Approximate schedule:
18.30: doors open ๐๐ป๐น
18.45: chitchat, announcements
19.00 - 19.30: ๐ถ TALK 1, Rich FitzJohn - ๐ฆ {orderly}
19.35 - 20.05: ๐ถ TALK 2, Matt Dray - ๐ฆ {drake}
(After the talks end, we usually adjourn to pub)

Efficient #Rstats Workflows with the ๐ฆ {drake} and {orderly} packages