Differential Privacy: Averting the risks of De-Anonymization


Details
For decades, we have lived in a golden era of accessibility to high-quality, accurate and reliable data from the US Census Bureau. These data are used to draw congressional districts, to allocate each year 800M dollars of federal aid to states and local governments, for research, and by private industry. In data science, we often use Census data as the gold standard for characterizing the population. However, recent research shows that the existing data do not sufficiently protect our privacy. In response, the Census is using the technology of “differential privacy” to radically re-engineer how data from the Census 2020 will be published. Starting in 2020, random noise will be injected into all public data. We’ll explore the privacy implications of Census data, what we know so far about these new methods, and what challenges this will present for data users.
Tim Kuhn is director of the Tennessee State Data Center at the UTK Boyd Center for Business and Economic Research. He provides technical training to data users, disseminates analysis and tools, and provides assistance to state and local government agencies. He serves on the City of Knoxville Census Working Group and was appointed by Governor Bill Lee to the Tennessee Complete Count Committee for Census 2020.
Nicholas Nagle is an associate professor of Geography and Data Science at the University of Tennessee, where he teaches course on statistics, demography and Geographic Information Science. Until 2017, he served on a National Academy of Sciences Standing Committee on Reengineering Census Operations 2020, which provided external feedback on changes happening within the Census Bureau.
[For a look into the DP techniques being developed for the Census, have a look at the jupyter notebooks in this WIP repo: https://github.com/umadesai/census-dp]
Screengrab from https://dpwiki.org/demo-top-down/dp-demo/sim-top-down/demo.html

Differential Privacy: Averting the risks of De-Anonymization