addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1launch-new-window--smalllight-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Fw: Introducing csvdedupe

From: Noel Hidalgo | B.
Sent on: Friday, August 16, 2013 1:38 PM
Happy Friday BetaNYC'ers!

This tool is coming to you from chicago, and wanted to share it with the rest of ya! Looks amazing.

N


Forwarded message:

From: Derek Eder <[address removed]>
Date: Friday, 16 August[masked]:39:36
Subject: Introducing csvdedupe

Howdy opengov-ers!

Today DataMade, in partnership with Knight-Mozilla OpenNewslaunched a new open source tool for easily de-duplicating files from the command line: csvdedupe.

Simply feed it a csv file (comma separated values), the list of columns you want it to look at and some training data and it will output a de-duplicated file telling you what rows it thinks are the same.

It's built on top of dedupe, an open source python library that we built to generically de-duplicate any kind of database or flat file. More on that here.

Check out our post on Source for more. Happy de-duping! 

Derek

--
Derek Eder
@derekeder
[address removed]

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy