June 2017 "Official" BARUG Meetup

This is a past event

157 people went

Location image of event venue


6:30 Pizza and networking
7:00 Announcements
7:05 Pete Mohanty: "Did “Communities in Crisis” Elect President Trump? An Analysis using Kernel Regularized Least Squares"
7:35: Earl Hubbell: Reproducible Data Science for Cancer Research at GRAIL
8:10: Ashley Semanskee: Using R Markdown to auto-write annual survey reports

Pete Mohanty
Did “Communities in Crisis” Elect President Trump? An Analysis using Kernel Regularized Least Squares

In both popular and academic discussions, commentators argue Trump's success was partly attributable to his appeal in “collapsing” communities. Suicides, drug overdoses, and other so-called “deaths of despair” rose sharply among non-Hispanic whites over the last several decades. Communities that have faced these challenges do seem to have been more likely to vote for Trump. Analyses to date, however, have made implausible simplifying assumptions. KRLS can estimate more nuanced models, which in this case reveal complex spatial dependencies and modest effect sizes that challenge the notion that Trump support was a simple response to public health concerns.

Because the underlying model is based on the pairwise comparisons, KRLS is inherently computationally intensive, and so I also briefly introduce bigKRLS, which is newly available on CRAN. bigKRLS offers algorithmic improvements that ease scaling constraints, decrease runtime (even on a single core), and facilitate interpretability with Shiny. Finally, I discuss KRLS' potential by briefly comparing with Elastic Net LASSO and Random Forest estimates and outlining recent additions to the package on GitHub.

Pete Mohanty is a Thinking Matters Fellow at Stanford University; this talk presents joint work with Robert Shaffer (University of Texas at Austin).


Earl Hubbell

GRAIL's mission is to detect cancer early, when it can be cured. To do so, we must develop a deep understanding of early cancer biology, which in turn requires us to exercise the utmost rigor in deriving insights from our data sets. We will describe how Rmarkdown, tidy data principles, and the RStudio ecosystem serve as one foundation for reproducible research within GRAIL. For internal and external* work alike, we are committed to industry-leading standards of statistical rigor, data provenance, and reproducibility.

*[1] "Performance of a high-intensity 508-gene circulating-tumor DNA (ctDNA) assay in patients with metastatic breast, lung, and prostate cancer", and *[2] "Cell-free DNA (cfDNA) mutations from clonal hematopoiesis: Implications for interpretation of liquid biopsy tests"


Ashley Semanskee is a Research Assistant at Kaiser Family Foundation.

Using R Markdown to auto-write annual survey reports

Annual surveys are often the most important (and most expensive) projects undertaken by a research organization. At Kaiser Family Foundation, we have been fielding the Employer Health Benefits Annual Survey (http://kff.org/health-costs/report/2016-employer-health-benefits-survey/) since 1998, and each year it takes 3,000+ hours to complete the full report. This year, we are creating an R template to analyze the findings and auto-write the report for the 2017 survey and all future surveys – saving thousands of hours of work a year. We can use R to make the survey analysis trackable, reproducible, and eliminate human error.

In this talk I will describe how to use R, ggplot2, and R Markdown to transform 2,000 survey responses into a 300 page auto-written report.