Suppressing, Synthesizing, and Analyzing Confidential Data

For our August Meetup, we're thrilled to have two speakers who will walk us through a range of options for sharing and working with data that cannot be fully shared. Tom Shen, Senior Data Analyst at the District of Columbia Office of the State Superintendent of Education, will talk about simulation-based approaches to keeping the public and researchers informed about K-12 education, without violating student confidentiality. Then Daniell Toth, Senior Research Mathematical Statistician for the Bureau of Labor Statistics, will dive deeper into mathematical techniques that preserve the key properties of data sets, while suppressing potentially identifying information.

Whether you're a government or industry organization looking to provide value to the public or to your customers, or a data scientist who consumes data from other organizations, or merely curious about how these statistical techniques work, this will be a highly valuable presentation.

Plus, we are extremely excited to be branching out a bit as far as location goes! This Meetup will be at the Clarendon, VA offices of Capital One Labs. In addition to being more convenient for our Virginia members (sorry, Maryland, we'll be back in the District in September), Capital One has a fantastic venue that lets us continue the discussion on their roof deck (weather permitting)! To make this happen, the event will be starting 30 minutes earlier than is typical.

Agenda:

• 6:00pm -- Networking, Food, and Refreshments

• 6:30pm -- Introduction

• 6:45pm -- Presentations

• 7:45pm -- Speaker and Audience Discussion

• 8:00pm -- Stay put for Data Drinks!

Bios:

Tommy Shen has a Masters in Public Policy from Georgetown University, and currently works as a Senior Data Analyst for the District of Columbia Office of the State Superintendent of Education. He's interested in open data, R, and waffles. Follow him @Gimperion

Daniell Toth has a PhD in Mathematics from Indiana University in Bloomington, and works as a Senior Research Mathematical Statistician at the Bureau of Labor Statistics. He is also Associate Editor of The American Statistician, and publishes in topics related to survey methods.

Parking:

For those driving, we encourage you to find parking for this event via our sponsor, ParkMe. ParkMe will help you find the closest, cheapest parking, and has iPhone and Android apps.

Join or login to comment.

  • A former member
    A former member

    From WaPo today: "Sensitive details are so pervasive in the documents that The Post is publishing only summary tables and charts online." They should've come to the talk!

    http://www.washingtonpost.com/w...­

    August 30, 2013

  • Brand N.

    Excellent Meetup! I suggest we have another Meetup there and have CapitalOne Labs tells about the work they do and especially their recent acquisition of Bundle to advance their big data agenda. "Bundle gives you unbiased ratings on businesses based on anonymous credit card data "

    I am doing a story about them: http://semanticommunity.info/Da...­

    Also announcement I made about our upcoming conference: http://www.afei.org/events/3A03­

    The most comprehensive meeting on Federal trends in Cloud, Semantics and Big Data in the country.

    August 30, 2013

  • David L.

    I thought the venue was fantastic. I was hoping to hear more outcome-based scenarios from the presenters.

    August 29, 2013

  • Nevin H.

    Outstanding. It's true: as soon as you bring privacy into the equation, the utility of your well-scrubbed dataset starts dropping; guess we'd better stick to means and averages within one standard deviation and just make clever comments about those outliers w/o really bringing them in.

    August 29, 2013

    • Tommy S.

      Nevin, one of the reasons I agreed to present yesterday, and ultimately did a poor job of conveying, is that I fundamentally believe that we, as a data science community, can do better than sums and averages; that instead of settling for the utility curves presented to us by government agencies, can expand the universe of the possible information and knowledge that can be gleaned from the data that your tax dollars and mine help to collect without making sacrifices to privacy.

      August 29, 2013

    • Tommy S.

      In many avenues of research, it has become almost Pavlovian to accept data as is without questioning the way data is collected, cleaned, and encoded. Without being able to engage the data providers in a metadata discussion, we are stuck between the options of 1) spending lots of money to collect our own or 2) accepting "official" numbers as is. As the appetite for open data increases and the connectivity of previously disparate data improves, there will be hard constraints on an agency's ability to anticipate new research questions through the creation of a few stock tables that are due for publication in a quarterly report... which is the underlying motivation for my simulation aspirations.

      August 29, 2013

  • Tommy S.

    My slides are here: www.straydots.com/sim_talk_deck/

    As to Kato's comment, it was a relatively tongue in cheek type comment to illustrate how belief in 'realness' of a data set can alter behavior. As someone who used to re-identify de-identified data sets for law firms, I'm well aware of the flimsiness of such a proposition... =)

    August 29, 2013

  • epsteinp

    Smart people, great presentations, a benefit to the Data Science Community.

    August 29, 2013

  • Eric

    Question: are the *presentation slides available
    Thank you

    August 29, 2013

  • Kato M.

    Very informative on data synthetics and simulations; however, I thought, synthetics need to fully incorporate privacy, a.k.a. privacy-by-design, to avoid any reconstruction attacks. Simply, striping out identifiers and then assuming that the data set is synthetic would have consequences as was shown in the Netflix and AOL data privacy challenges some few years ago.

    August 29, 2013

  • Harlan H.

    The geospatial animation that Keelan created of RSVPs is here: http://www.youtube.com/watch?v=...­

    August 29, 2013

  • Oscar O.

    Another great meetup event! Sorry I had to leave early though

    August 28, 2013

  • Jean

    Hey all - What is the protocol for folks on the waitlist getting into the event? Can I show up in hopes that those confirmed may not attend? Just don't want to arrive and get turned away. Thx!

    August 28, 2013

    • Tommy S.

      We're all data scientists (and hopefully gamblers) here. Given the weather, events in town, and that we're meeting in Arlington instead of DC, I would make your own assumptions about the likely turnout today. What's the worst thing that could happen? ;) (Sorry Harlan!)

      3 · August 28, 2013

    • Jean

      Waitlist is dwindling and I am in! See you both there.

      1 · August 28, 2013

  • Cindy C.

    really interested in data analysis

    August 28, 2013

  • Meagan

    I will unfortunately not be able to attend, due to scheduling (and likely traffic). I hope somebody else will be able to take my spot.

    August 28, 2013

  • Omar A.

    Sorry, just found out that I have a work function. I really wanted to make it to this one.....

    August 28, 2013

  • Andy B.

    Really hoping to make it in this one! It is very relevant to me!

    August 28, 2013

  • Andy

    will there be webcast of the event or recording?

    August 25, 2013

    • Harlan H.

      No webcast. There will likely be slides and an audio recording.

      2 · August 27, 2013

  • michael k.

    ;oot

    August 25, 2013

  • Yuxi Z.

    First time to join meetup. So excited!

    August 24, 2013

  • Cindy C.

    really interested in data analysis

    1 · August 19, 2013

  • Claire

    Anyway to get to 8th floor without a badge?

    August 14, 2013

    • Harlan H.

      You definitely won't need a badge to get in.

      2 · August 17, 2013

  • chris m.

    Software engineer.

    August 16, 2013

  • Craig

    I am very interested but will not be able to attend. Will video or slides be made available?

    August 13, 2013

  • Asad

    Looking forward to it.

    August 11, 2013

  • Betsy D.

    Always interesting to hear from these presenters.

    August 9, 2013

  • Kato M.

    Am looking forward to this talk.

    August 8, 2013

  • Junko

    Looking forward to it :-)

    August 8, 2013

  • Stephan

    Fiancee's birthday

    August 8, 2013

  • Shelley

    If I am in town that day this is good one for me. I run a fed. disclosure review broard at a fed statisitcal agency. Data confid. is my job.

    August 8, 2013

  • Stephan

    I'll ask others to see if they'll be there.

    August 8, 2013

Our Sponsors

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy