Survey analysis: beyond the subscripts


Analysing complex sampling -- cluster samples, unequal probabilities, stratification -- used to be a separate and somewhat isolated area of statistics. Some books on the topic might have given the impression it was mostly about the care and feeding of subscripts. In fact, well-conducted complex sampling gives data with important differences from other data sources, but ones whose impact on the analysis is limited. Modern survey analysis software is aimed at reducing the differences to a minimum, so analysts and domain experts can analyse complex surveys correctly in most straightforward cases. I will talk about how you can use the R survey package to do most of the analyses you already do, but with data from complex surveys. ​

Thomas Lumley is originally from Australia, but worked in Seattle, at the University of Washington, for 12 years. He is currently Professor of Biostatistics at the University of Auckland, where he teaches statistics and data science. Thomas is the developer of the R survey package, and a member of R Core. He has taught workshops on R in 13 time zones.