addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonJoin Group on CardStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Building a Lightweight Discovery Interface for Chinese Patents

Presented by Open Source Connections:

The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO. GPSN,, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.

This case study will cover: 

 • How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers. 

 • We will cover some of the innovative methods for converting XML formatted data to usable information.

 • Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.

 • Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.

 • How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.


Join or login to comment.

  • Eugene D.

    Eric, you mentioned couple of tools during this talk: one is node.js interface for solr and another one - framework you built with Ember.js for front end. Can you post link to those as well?

    May 29, 2014

    • Eric P.

      See my blog post!

      May 30, 2014

  • John Peter S.

    Eric , a very good presentation. I like Tika. It's a very useful tool. I haven't studied ember.js, but I hear it's a good platform. What was the logic used to shard the index? Was it a range of indices corresponding to time, lex order?


    May 29, 2014

    • Eric P.

      Ember.js is great, but it's definitely got a ramp up. If I am building a single page app, Ember is great, it has my data model, controllers, everything.... If I am building widgets to sprinkle about an existing app, then AngularJS is a bit easier. As far as the sharing that we did, we just did a MOD on the primary key of each document, which was the patent_id and kind_code combined. We didn't get fancy about time, as it turns out time isn't a key facet for examiners. They want recall over relevancy.

      May 30, 2014

  • Eric P.

    I posted a list of links that I mentioned, as well as updated copy of the slides.

    May 30, 2014

  • Gary M.

    Excellent presentation by Eric and great questions from several attendees.

    May 29, 2014

  • Eugene D.

    Eric, excellent presentation. Very informative. Thank you very much.

    May 28, 2014

  • Nitin k.

    It would be great if this presentation is recorded. There are people who have last minute cancellation but really want to learn and benefit a lot from this....
    Also, would be great if slides can be put up... Pl. let us know….

    May 28, 2014

  • Eric P.

    I'm super excited to be talking, and if folks have suggestions, areas they want to see, please let me know. Want to talk more about rich javascript for Solr? Want to dig down into the bowels of Tika based data ingestion? Let me know!

    May 16, 2014

    • A former member
      A former member

      Some topic areas I'd be interested in hearing about: overall app architecture - do you have a clean REST API layer? Security - can a user send a Solr delete command to delete all documents, any document or role-level security? SolrCloud - do you use it, how many shards, how did you decide how to shard, how many replicas and how you decided, and how dynamic is any of that? Who maintains and administers Solr and the app? Is the Solr config, schema, plugins, and app open source and available to us in GitHub or not, or does "the government" own it, or is it proprietary? Stats like average query time, frequency and count of updates, total number of documents and docs per shard. Finally, what level of expertise does an app like this require - can an "average" Solr user replicate this type of project? Thanks!

      2 · May 28, 2014

    • Peter D.

      I'd be interested in hearing about any unique challenges of working with Chinese text, handling queries in English/Chinese, etc...

      May 28, 2014

  • Ananth K.


    March 17, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy