addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

PDF Liberation Hackathon

Together with the Sunlight Foundation and, the Open Source Finance group is organizing a PDF Liberation Hackathon in San Francisco. It would be great to have Open Data Bay Area as a cosponsor for this event. Here is the Sunlight announcement:

Join or login to comment.

  • Marc J.

    Thanks again for signing up for this weekend's hackathon.

    We kick off Friday at 7pm with pizza, beer, soda, opening remarks and team formation.

    Any teams that want to submit a project for judging must do so by 12 noon on Sunday. To be considered, projects must be posted on GitHub. We will fork them into the PDF Liberation GitHub organization at

    RallyPad will be available and stocked with refreshments Friday evening, all day Saturday and on Sunday morning, but you can work from home if you prefer. We hope to have a Google Hangout with other locations Saturday at 10am. Also, you can use IRC #SunlightLabs on freenode to communicate with participants at other sites. This will be a good way to get technical assistance.

    Please send me an email message if you have any questions. Otherwise, I look forward to meeting everyone soon. Marc

    January 16, 2014

  • Marc J.

    Hackathon challenges are now available on the resource page at The hackathon kicks off at 7pm Friday with refreshments, socializing and pitching. Judging starts at Noon Sunday.

    January 14, 2014

  • Marc J.

    Thanks again to everyone who has registered for the PDF Liberation Hackathon. We begin with a welcoming party, food, announcements and pitching this Friday at 7pm. More than 50 people have signed up across four Meetups and our Eventbrite page. Unfortunately, Meetup does not provide organizers with email addresses, so communicating with all registrants is difficult. To stay in the loop ahead of the hackathon, please sign up on Eventbrite at, complete our Google Form survey at or send your email address to marc[at]publicsectorcredit[dot]org.

    January 12, 2014

  • Greg L.

    A couple of ideas to get us thinking:

    tabula-extractor seems to have a pretty minimal test-suite, just 7 pdfs produced by a limited range of software. (What? No journalist is analyzing pdfs produced with TeX?!)

    Tabula only does a page at a time? Which must be really tedious if you have a 100 page table.

    The Tabula docs say that they can't afford to have it up as a website that people can just use. Perhaps we are smart enough to figure out an on-demand cloud service? Then poor journalists won't have to download and install the thing on their laptops.

    November 21, 2013

    • Greg L.

      Manuel, thanks for responding! It would be very nice if you could make up your wishlist before the hackathon.

      November 25, 2013

    • Manuel A.

      Quick update:
      "Multi page tables" is done. See preview:­

      January 11, 2014

  • Marc J.

    Thanks to everyone who has RSVPed for this Meetup. If you haven't already done so, please complete the Google Form at Also, I have made further updates to the resource page at There is now a pretty comprehensive list of PDF Extraction tools on that page. If you are aware of others that I should list, please tell me.

    November 30, 2013

  • Marc J.

    Tabula was mostly developed by a young guy from Argentina. It is going to need a larger community and perhaps some financial support to become a tool capable of supporting large scale production work. I hope the hackathon will help get Tabula to the next level and/or bring forward other tools that the open data community call rally around. See my resource page at

    November 21, 2013

23 went

Our Sponsors

  • O'Reilly Media

    Provides books and discounts for the group members.

  • Code for America

    CfA team members participate in the planning and organization.

  • Common Crawl

    CC team members participate in planning and organization.

  • Internet Archive

    Internet Archive team members participate in planning and organization.

  • Jetpac

    Jetpac team members participate in planning and organizing this group.

  • Kaggle

    Kaggle team members participate in the planning and organization.

  • Mendeley

    Mendeley team members participate in the planning and organization.

  • Open Knowledge Foundation

    OKF team members participate in the planning and organization.

  • Wikimedia

    Wikimedia team members participate in planning and organization.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy