addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1launch-new-window--smalllight-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Re: [betaNYC] A Day in the Life of a New York City Taxi

From: Joel N.
Sent on: Wednesday, July 16, 2014 11:42 AM
A most interesting thread.

Compounding matters is that computing horsepower to do automated mosaic effect mining is now accessible to just about anyone.  

We're actually working on a project to defend against this, and its just hard to do it in a way that accounts for all the possible mosaic scenarios without unduly diluting the data.

Perhaps, we can take a page from the US Census Bureau?  The constitutionally-mandated Data Science agency ;), gathering data about us since 1790.*  That is, just reporting aggregates and embargo the detail data for 72-years.

Granted, the insight is not down to the address level, but its useful enough that whole industries (not counting, listlicles of the day from several link-bait sites) use this data.

That was our experience with our NYCpedia.com experiment when we aggregated data to more accessible boundaries beyond census tracts and blocks, precincts, etc. to more familiar boundaries.  A lot of the feedback we got was quite positive since residents found out things about their neighborhood that they didn't know.  But what if we drilled down to the house level?  I'm sure we would have gotten more complaints.

On the flip side, would Chris would have been able to build his awesome viz with aggregate data?  Nope.

I agree with Andrew, Steven, et.al that we should take this experience into account when drafting policy recommendations.  Perhaps, one approach is to have open data be more Census-like, and then embargo the detail data for a certain period of time before general public release.  

And for those who need the detail data now - (researchers, Civic Hackers, businesses, NGOs, etc.), they can file a request through a certain next-gen FOI mechanism where they can have access to a modern, detail data portal under certain terms & conditions?

- Joel

* Side note: When the 1880 census to 7 years to compile, the agency turned to Hollerith's tabulating machines, which spawned the Computing Era - and the Computing, Tabulating & Recording Company, later renamed IBM. ;)

=======================================================
Think Different! (http://en.wikipedia.org/wiki/Think_different#Text)
Imagine Different! (http://www.youtube.com/watch?v=H5tOgRD4EqY)


On Wed, Jul 16, 2014 at 10:24 AM, Jeremy Barth <[address removed]> wrote:
Chris, I suggested a possible mosaic effect -- inquiring into the objective correlatives of a Muslim cabbie's daily observances.  It's a trivial Big Data exercise that is a consequence, if entirely unintended (the very essence of the mosaic effect), of the anonymization breakdown.

Sure, cabbie names and medallions are a matter of public record.  But two columns of data don't yield a mosaic.   (Or much of an animation -- I do want to say that your app is way cool.)

If you're asking can I show concrete, particularized harm to an individual -- no, I can't, I'm merely speculating.  But I would argue that the precautionary principle dictates we err on the side of anonymity in public data releases unless there is a very good reason not to.


On Jul 16, 2014, at 9:07 AM, Chris Whong <[address removed]> wrote:

Steven,  

You're talking about much bigger privacy in Open Data issues, I am talking specifically about hack licenses and medallions in New York City.

I have no challenge with the idea that government has a tremendous responsibility to protect privacy and be judicious about the data it releases.  Of course, the intent of the TLC was to hide the hack licenses but still allow you to link together trips from the same driver... that was a failure due to a technical oversight, but what are the consequences in this case, for this dataset?  

-Chris


On Wed, Jul 16, 2014 at 8:43 AM, Steven Adler <[address removed]> wrote:
>
> Rich,
>
> Cab drivers operate private vehicles with a public medallion pinned to their hood.  The medallion does not empower the public to discover a driver's home address, use of their car for personal transportation, or stops for private purposes.  Passengers of the vehicle may also wish to protect the privacy of their trips.
>
> EVERYONE should expect a right to privacy unless they surrender their right by participating in public activities and/or provide an Opt-in.  None of us would want to live in a city where our personal trips in any car are made instantly public....
>
> Clearly this was the intent of the City when they chose to de-identify the data prior to publication.  We are arguing that they did not go far enough because the data can be re-identified.
>
> I think we should assist the City in creating more sophisticated privacy controls for Open Data that it publishes and this discussion is already providing great ideas thanks to Chris who gave us all a reason to explore it.
>
>
> Best Regards,
>
> Steve
>
> Motto: "Do First, Think, Do it Again"
>
>
> From: Richard Robbins <[address removed]>
> To: [address removed]
> Date: 07/16/[masked]:24 AM
> Subject: Re: [betaNYC] A Day in the Life of a New York City Taxi
>
> ________________________________
>
>
>
> Great discussion.
>
> One question (and I'm not sure of the right answer).... Should cab drivers have an expectation of privacy? They are operating quasi-public vehicles that require a public license. Might we compare them to airline pilots, who are also operating private vehicles but who are subject to government licenses and whose every move is recorded and tracked?
>
> Flip argument is that some passenger trips could also be tracked. Imagine if a stalker / jealous boyfriend (or a reporter) wants to see where a woman went after she got in a cab? If the stalker got the cab # and location of where she got in the cab, he could find the end point of the ride. (Even without the cab #, rides from remote pickup spots could still be tracked just with pickup location / time.)
>
>
> Richard Robbins
> (646)[masked]
> @rich1
>
> On Jul 15, 2014, at 10:25 PM, Chris Whong <[address removed]> wrote:
>
> @Jeremy
>
> Sorry for late response, was AFK most of the evening.  Describing the de-anonymization of these taxi trips as a "Privacy Fiasco" is nothing short of hyperbole, and makes for sensational news pieces and slippery slope "what-ifs" in the comments section.  As Jacob and Andrew have pointed out, the real issue is whether it should have been anonymized in the first place, not that it was de-anonymized.  I suspect that the reporters wouldn't have given a damn if there wasn't any techy talk of MD5 algorithms and lookup tables.
>
> Most open data-minded people I've spoken with over the past few weeks consider this a critical lesson about anonymization on a public dataset where the consequences of deanonymization are negligible, and I am inclined to agree.  Better hack licenses and driver names, which are already public and downloadable in plain text, than social security numbers.  
>
> To put it another way, would I be violating the driver's privacy if I tweeted all of the details of my taxi trip as it happened, including the driver's name, photo and license number?  Does my right to know who he/she is only apply to me as a passenger, right then and there during my ride, or does everyone have a right to know, all the time?  The data are just this information times 173 million with a 6-month delay, collected using our tax dollars, and extremely useful for studying the transportation ecosystem AND the effectiveness of the current regulatory setup.  
>
> -Chris
>
>
>
>
>
>
>
> --
> Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
> This message was sent by Chris Whong ([address removed]) from #betaNYC, a Code for America Brigade for NYC.
> To learn more about Chris Whong, visit his/her member profile
> To report this message or block the sender, please click here
> Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages
>
> Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]
>
>
>
>
> --
> Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list (
> [address removed])
>
> This message was sent by Richard Robbins ([address removed]) from
> #betaNYC, a Code for America Brigade for NYC.
>
> To learn more about Richard Robbins, visit his/her
> member profile
>
> To report this message or block the sender, please click here
> Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages
>
> Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]
>
>
>
>
>
> --
> Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
> This message was sent by Steven Adler ([address removed]) from #betaNYC, a Code for America Brigade for NYC.
> To learn more about Steven Adler, visit his/her member profile
>
> To report this message or block the sender, please click here
> Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages
>
> Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Chris Whong ([address removed]) from #betaNYC, a Code for America Brigade for NYC.
To learn more about Chris Whong, visit his/her member profile
To report this message or block the sender, please click here
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Jeremy Barth ([address removed]) from #betaNYC, a Code for America Brigade for NYC.
To learn more about Jeremy Barth, visit his/her member profile

To report this message or block the sender, please click here
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy