Skip to content

May Meetup - Elsevier's Datasearch Platform & Harvesting data from PDFs

Photo of Charlie Hull
Hosted By
Charlie H.
May Meetup - Elsevier's Datasearch Platform & Harvesting data from PDFs

Details

Our first talk is from Peter Cotroneo, Senior Product Manager at Elsevier, who will present DataSearch, Elsevier’s award-winning search engine that allows scientists and researchers to search for many different data types and formats across a variety of domain-specific and cross-domain institutional data repositories and other data sources. Peter will discuss the challenges of building DataSearch, the technology stack and future direction.

The second talk is from Michael Hardwick, founder and Managing Director of Elite Software, where he is responsible for PDF data extraction technology. Michael has a strong preference for C++ and a background in Mathematics, Physics, and Astronomy. Michael will talk about harvesting data from PDFs, covering a short history of the format and how extraction tools must cope with typefaces, fonts, spacing, columns and reading order, encryption and more! If you've ever wrestled with indexing PDF data this talk will be of interest.

Photo of Apache Lucene/Solr - London User Group group
Apache Lucene/Solr - London User Group
See more events
Huckletree
18 Finsbury Square, London, EC2A 1AH · London