(Virtual) Hands-on Apache Tika tika-eval workshop (Part 1)

Hosted By
Tim A.

Details
This will be a virtual, hands-on workshop to introduce the capabilities of the tika-eval module. This workshop is designed for those interested in:
- profiling files (digests, mime types)
- profiling text extracted from files (number of tokens, automatic language detection, out-of-vocabulary statistic/junk detection)
- comparing text extracted from different text extractors.
There will be a heavy emphasis on processing PDF files.
Attendees should be comfortable running tika-app from the commandline or curl'ing to a local tika-server. See the link below for prerequisites (still a work in progress).
https://cwiki.apache.org/confluence/display/TIKA/Apache+Tika+Meetups

Apache Tika Community (Virtual)
See more events
Online event
This event has passed
(Virtual) Hands-on Apache Tika tika-eval workshop (Part 1)