Skip to content

(Virtual) Hands-on Apache Tika tika-eval workshop (Part 1)

Photo of Tim Allison
Hosted By
Tim A.
(Virtual) Hands-on Apache Tika tika-eval workshop (Part 1)

Details

This will be a virtual, hands-on workshop to introduce the capabilities of the tika-eval module. This workshop is designed for those interested in:

  1. profiling files (digests, mime types)
  2. profiling text extracted from files (number of tokens, automatic language detection, out-of-vocabulary statistic/junk detection)
  3. comparing text extracted from different text extractors.

There will be a heavy emphasis on processing PDF files.

Attendees should be comfortable running tika-app from the commandline or curl'ing to a local tika-server. See the link below for prerequisites (still a work in progress).

https://cwiki.apache.org/confluence/display/TIKA/Apache+Tika+Meetups

Photo of Apache Tika Community (Virtual) group
Apache Tika Community (Virtual)
See more events