BARUG is BACK


Details
Dear BARUG Members,
It has been a long time since we were last able to meet. However, it looks like the time is right for us to try and pick up where we left off. Let's meet on July 18th for an evening of friendship, good conversation, and talk about R.
Agenda:
6:30PM Pizza and Networking
7:00 Bob Horton - Multi-stage Simpsons Paradox Machine
7:30 Norm Matloff - Addressing Fairness in Machine Learning
8:00 Earl Hubbell - Using RStan to Infer Timescales of Cancer Development
Abstracts
---------------------------------------------------------
Multi-stage Simpsons Paradox Machine
Robert Horton - Microsoft Data Science
We have generated a dataset ('mssp' for short) that demonstrates multiple Simpson's Paradox reversals of the sign of the coefficient of covariate X, depending on which other covariates are included in the analysis. The system is designed so that as we add in co-variates 'Z1', 'Z2', 'Z3', 'Z4', and 'Z5', cumulatively and in that order, the coefficient of X for predicting Y repeatedly changes sign:
mssp <- read.csv("multi_stage_simpsons_paradox_data.csv")
coef( lm(Y ~ X, mssp) )['X'] # 1.137793
coef( lm(Y ~ X + Z1, mssp) )['X'] # -0.998305
coef( lm(Y ~ X + Z1 + Z2, mssp) )['X'] # 1.01994
coef( lm(Y ~ X + Z1 + Z2 + Z3, mssp) )['X'] # -1.034066
coef( lm(Y ~ X + Z1 + Z2 + Z3 + Z4, mssp) )['X'] # 1.281207
coef( lm(Y ~ X + Z1 + Z2 + Z3 + Z4 + Z5, mssp) )['X'] # -1.007758
This ws done by implementing the Multistage Simpson's Paradox Machine described conceptually by Judea Pearl in 2014. You can download this dataset from our github repo and try the analysis at home; come to the talk to see how it was created!
-----------------------------------------------------------------------
Norm Matloff - Professor Computer Science UC DavisAddressing Fairness in Machine Learning
Recently there has increasing concern that machine learning (ML) algorithms lead to policies that suffer from racial, gender or other bias. I will present summaries of two recent papers of mine on this topic, one on detecting and describing bias, and the other on remedying it. R code examples will be given.
------------------
Earl Hubbell, Distinguished Scientist at GRAIL
Using RStan to Infer Timescales of Cancer Development
Early detection of cancer occurs during the preclinical phase, before clinical diagnosis, when development of cancers is unobservable. Using a targeted methylation assay and the associated locked classifier for cancer signal detection on blood draws from two large prospective biobank studies, the American Cancer Society Cancer Prevention Study-3, in which blood draws occurred before clinical diagnosis, and the Circulating Cell-Free Genome Atlas Substudy 3, where blood draws occurred with clinical diagnosis, we can make inferences about the rate of development of cancer before diagnosis.
Models for detection of shed tumor DNA in these biobanks were fitted using the Stan bayesian modeling language using the rstan package. This allowed for rich modeling of the factors involved in cancer development and tumor DNA shedding, including study type, cancer type and stage, and a time dependent rate of cancer development that also varied by cancer type and stage. Results from this modeling were presented at the annual meeting of the American Society of Clinical Oncology 2023 (ASCO 2023).
COVID-19 safety measures

Sponsors
BARUG is BACK