Skip to content

BARUG is BACK

Photo of Joseph Rickert
Hosted By
Joseph R.
BARUG is BACK

Details

Dear BARUG Members,

It has been a long time since we were last able to meet. However, it looks like the time is right for us to try and pick up where we left off. Let's meet on July 18th for an evening of friendship, good conversation, and talk about R.

Agenda:
6:30PM Pizza and Networking
7:00 Bob Horton - Multi-stage Simpsons Paradox Machine
7:30 Norm Matloff - Addressing Fairness in Machine Learning
8:00 Earl Hubbell - Using RStan to Infer Timescales of Cancer Development

Abstracts
---------------------------------------------------------
Multi-stage Simpsons Paradox Machine
Robert Horton - Microsoft Data Science

We have generated a dataset ('mssp' for short) that demonstrates multiple Simpson's Paradox reversals of the sign of the coefficient of covariate X, depending on which other covariates are included in the analysis. The system is designed so that as we add in co-variates 'Z1', 'Z2', 'Z3', 'Z4', and 'Z5', cumulatively and in that order, the coefficient of X for predicting Y repeatedly changes sign:

mssp <- read.csv("multi_stage_simpsons_paradox_data.csv")
coef( lm(Y ~ X, mssp) )['X'] # 1.137793
coef( lm(Y ~ X + Z1, mssp) )['X'] # -0.998305
coef( lm(Y ~ X + Z1 + Z2, mssp) )['X'] # 1.01994
coef( lm(Y ~ X + Z1 + Z2 + Z3, mssp) )['X'] # -1.034066
coef( lm(Y ~ X + Z1 + Z2 + Z3 + Z4, mssp) )['X'] # 1.281207
coef( lm(Y ~ X + Z1 + Z2 + Z3 + Z4 + Z5, mssp) )['X'] # -1.007758

This ws done by implementing the Multistage Simpson's Paradox Machine described conceptually by Judea Pearl in 2014. You can download this dataset from our github repo and try the analysis at home; come to the talk to see how it was created!
-----------------------------------------------------------------------
Norm Matloff - Professor Computer Science UC DavisAddressing Fairness in Machine Learning

Recently there has increasing concern that machine learning (ML) algorithms lead to policies that suffer from racial, gender or other bias. I will present summaries of two recent papers of mine on this topic, one on detecting and describing bias, and the other on remedying it. R code examples will be given.

------------------

Earl Hubbell, Distinguished Scientist at GRAIL
Using RStan to Infer Timescales of Cancer Development

Early detection of cancer occurs during the preclinical phase, before clinical diagnosis, when development of cancers is unobservable. Using a targeted methylation assay and the associated locked classifier for cancer signal detection on blood draws from two large prospective biobank studies, the American Cancer Society Cancer Prevention Study-3, in which blood draws occurred before clinical diagnosis, and the Circulating Cell-Free Genome Atlas Substudy 3, where blood draws occurred with clinical diagnosis, we can make inferences about the rate of development of cancer before diagnosis.

Models for detection of shed tumor DNA in these biobanks were fitted using the Stan bayesian modeling language using the rstan package. This allowed for rich modeling of the factors involved in cancer development and tumor DNA shedding, including study type, cancer type and stage, and a time dependent rate of cancer development that also varied by cancer type and stage. Results from this modeling were presented at the annual meeting of the American Society of Clinical Oncology 2023 (ASCO 2023).

COVID-19 safety measures

Event will be indoors
The event host is instituting the above safety measures for this event. Meetup is not responsible for ensuring, and will not independently verify, that these precautions are followed.
Photo of Bay Area useR Group (R Programming Language) group
Bay Area useR Group (R Programming Language)
See more events