addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcredit-cardcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1languagelaunch-new-window--smalllight-bulblinklocation-pinlockm-swarmSearchmailmediummessagesminusmobilemoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahooyoutube

OpenAustralia Foundation Message Board › Pecuniary interests register

Pecuniary interests register

Chris
user 183361068
Sydney, AU
Post #: 3
I've uploaded the OCR'd copies of the original volume 1 & 2 to here:




I also ran it over the per-person splits that Luke created:



As you'd expect, pretty much everything hand-written gets missed. But it picked up typed info fairly well.

The search in your pdf reader if pretty unforgiving with case-sensitivity & spaces between words, but throwing these files into something like elasticsearch / solr should cope with that.
Nick E.
user 12189673
Sydney, AU
Post #: 3
Awesome!
Nick E.
user 12189673
Sydney, AU
Post #: 4
Picking up the legislative council register today.
Luke B.
user 43605732
Sydney, AU
Post #: 7
Thats great Nick, will you need to scan it all somehow or will they give you a soft copy?
Nick E.
user 12189673
Sydney, AU
Post #: 5
Yeah, I need to scan it as they wouldn't give me an electronic copy. It's bound, too, so first I need to work out the best way to un-bind it.
Nick E.
user 12189673
Sydney, AU
Post #: 6
Also also need to work out any copyright issues before it goes online etc. so don't expect it for a few days!
Nick E.
user 12189673
Sydney, AU
Post #: 7
While I'm sorting out the other register, I went ahead and started a PyBossa project. If we can get it sorted soon-ish, then can just skip straight to crowdsourcing rather than getting students or whoever to help out (or possibly some combination of both).

The repo for the task list and PyBossa template is here.

I nicked a template off someone else's project for transcribing PDFs as it has multiple input fields with ability to add more items per entry, which is what we need I think.

Next step is to customise it to get everything working then we're good to go - anyone who wants to help out with the template/PyBossa thing please feel free to dive in to the repo.
Henare D.
henare
Sydney, AU
Post #: 12
There's a very relevant thread happening on the Poplus list about this too: https://groups.google...­
Chris
user 183361068
Sydney, AU
Post #: 4

...but throwing these files into something like elasticsearch / solr should cope with that.
Hey folks, so I'm not sure it's useful / interesting, but I went ahead with this for giggles.

I've thrown up a (primitive!) search interface.

Could be interesting augmenting with the PyBossa project results at some point? (I'm still trying to learn-up about that...)

Partway through chopping up the 2012-2013 docs. Will add them into the mix as time allows.
Henare D.
henare
Sydney, AU
Post #: 13

I've thrown up a (primitive!) search interface.
Awesome!
Powered by mvnForum

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy