PDF Liberation Hackathon

Together with the Sunlight Foundation and Rally.org, the Open Source Finance group is organizing a PDF Liberation Hackathon in San Francisco. It would be great to have Open Data Bay Area as a cosponsor for this event. Here is the Sunlight announcement: http://sunlightfoundation.com/blog/2013/11/15/opengov-voices-pdf-liberation-hackathon-at-sunlight-in-dc-and-around-the-world-january-17-19-2014/

Join or login to comment.

  • Marc J.

    Thanks again for signing up for this weekend's hackathon.

    We kick off Friday at 7pm with pizza, beer, soda, opening remarks and team formation.

    Any teams that want to submit a project for judging must do so by 12 noon on Sunday. To be considered, projects must be posted on GitHub. We will fork them into the PDF Liberation GitHub organization at https://github.com/pdfliberation.

    RallyPad will be available and stocked with refreshments Friday evening, all day Saturday and on Sunday morning, but you can work from home if you prefer. We hope to have a Google Hangout with other locations Saturday at 10am. Also, you can use IRC #SunlightLabs on freenode to communicate with participants at other sites. This will be a good way to get technical assistance.

    Please send me an email message if you have any questions. Otherwise, I look forward to meeting everyone soon. Marc
    [masked]

    January 16, 2014

  • Marc J.

    Hackathon challenges are now available on the resource page at http://pdfliberation.wordpress.com. The hackathon kicks off at 7pm Friday with refreshments, socializing and pitching. Judging starts at Noon Sunday.

    January 14, 2014

  • Marc J.

    Thanks again to everyone who has registered for the PDF Liberation Hackathon. We begin with a welcoming party, food, announcements and pitching this Friday at 7pm. More than 50 people have signed up across four Meetups and our Eventbrite page. Unfortunately, Meetup does not provide organizers with email addresses, so communicating with all registrants is difficult. To stay in the loop ahead of the hackathon, please sign up on Eventbrite at http://www.eventbrite.com/e/pdf-liberation-hackathon-san-francisco-tickets-8842635561, complete our Google Form survey at https://docs.google.com/forms/d/1rXggRHhlprYfHsB64dZ-zpVrhqMitcPkN1vEMD7_03k/viewform or send your email address to marc[at]publicsectorcredit[dot]org.

    January 12, 2014

  • Greg L.

    A couple of ideas to get us thinking:

    tabula-extractor seems to have a pretty minimal test-suite, just 7 pdfs produced by a limited range of software. (What? No journalist is analyzing pdfs produced with TeX?!)

    Tabula only does a page at a time? Which must be really tedious if you have a 100 page table.

    The Tabula docs say that they can't afford to have it up as a website that people can just use. Perhaps we are smart enough to figure out an on-demand cloud service? Then poor journalists won't have to download and install the thing on their laptops.

    November 21, 2013

    • Greg L.

      Manuel, thanks for responding! It would be very nice if you could make up your wishlist before the hackathon.

      November 25, 2013

    • Manuel A.

      Quick update:
      "Multi page tables" is done. See preview: http://www.flickr.com...­

      January 11, 2014

  • Marc J.

    Thanks to everyone who has RSVPed for this Meetup. If you haven't already done so, please complete the Google Form at https://docs.google.com/forms/d/1rXggRHhlprYfHsB64dZ-zpVrhqMitcPkN1vEMD7_03k/viewform. Also, I have made further updates to the resource page at http://pdfliberation.wordpress.com. There is now a pretty comprehensive list of PDF Extraction tools on that page. If you are aware of others that I should list, please tell me.

    November 30, 2013

  • Marc J.

    Tabula was mostly developed by a young guy from Argentina. It is going to need a larger community and perhaps some financial support to become a tool capable of supporting large scale production work. I hope the hackathon will help get Tabula to the next level and/or bring forward other tools that the open data community call rally around. See my resource page at http://pdfliberation.wordpress.com.

    November 21, 2013

Our Sponsors

  • O'Reilly Media

    Provides books and discounts for the group members.

  • Code for America

    CfA team members participate in the planning and organization.

  • Common Crawl

    CC team members participate in planning and organization.

  • Internet Archive

    Internet Archive team members participate in planning and organization.

  • Jetpac

    Jetpac team members participate in planning and organizing this group.

  • Kaggle

    Kaggle team members participate in the planning and organization.

  • Mendeley

    Mendeley team members participate in the planning and organization.

  • Open Knowledge Foundation

    OKF team members participate in the planning and organization.

  • Wikimedia

    Wikimedia team members participate in planning and organization.

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Start your Meetup today

Act now and get 50% off.
Until February 1.

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy