addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

AgroHackathon Message Board › Hack 14: Information extraction from publications on viticulture research

Hack 14: Information extraction from publications on viticulture research

Mihalis P.
user 204520102
Glifáda, GR
Post #: 1

Bibliographical resources in pdf format contain rich information concerning the resource itself. This is rarely described in the metadata that follows a resource.
This is a challenge being addressed by a European Initiative, namely the OpenMinTed Project (http://openminted.eu/­) and focuses on extracting information from publications on viticulture research. More specifically, the aim of this challenge is to extract information from publications in pdf format, or use already extracted information and annotate the resource with richer metadata. These extraction may involve (but not limited to): (a) image and caption extraction and (b) topic annotation. Of course there is no limitation to the data mining algorithms that can be applied.

Datasets
To achieve this challenge we have a sample dataset of publications on viticulture research here, in which each pdf is manually annotated by a domain expert with richer information. This dataset can be used for both testing and evaluation purposes.

In this challenge, the following taxonomies for grape varieties and ampelographic descriptions can be used:
Vitis ontology (JSON)
Agrovoc & FAOGeopolitical Ontology (SPARQL)
Grape Varieties (SPARQL)

We propose the usage of open libraries and APIs to this complete the challenge. Such a suggestion can be: http://api-dev.freme-project.eu/doc/api-doc/full.html­ to which one can send textual information and get a list of possible topics and entities that were extracted, along with the respective scores.
catherine R.
user 203561110
Clermont-Ferrand, FR
Post #: 2
We have our own french pdf corpus and our own skos resource that we would like to use to annotate our corpus. Our corpus is about french agricultural alert bulletin and our resource is on different type of crop used in France. So we are interested to see how to handle pdf with agroportal and maybe discuss about the annotation output of agroportal annotator.
Stéphan
user 204166189
Clermont-Ferrand, FR
Post #: 1
I'm interested by automatic annotation of pdf files. This format is particularly difficult to work on, and it's so commonly used that there are many information in pdf documents.
And I'm also interested on wine :-)
Mihalis P.
user 204520102
Glifáda, GR
Post #: 2
Excellent then, I look forward to discussing this tomorrow!
Mihalis P.
user 204520102
Glifáda, GR
Post #: 3
Sample API call: http://52.18.30.225:8...­
Michael D.
user 10124239
Palo Alto, CA
Post #: 3
just FYI, I've added the team photos in the hack-14 folder of the repo.
Powered by mvnForum

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy