Skip to content

(Virtual) Hands-on Apache Tika tika-pipes workshop (Part 2)

Photo of Tim Allison
Hosted By
Tim A.
(Virtual) Hands-on Apache Tika tika-pipes workshop (Part 2)

Details

This will be a virtual, hands-on workshop to go deeper into the capabilities of the tika-pipes module.

Specifically, this workshop will go into greater detail on configuration options for the OpenSearch and Solr emitters. It will also cover integration with tika-eval and the PipesReporter.

The tika-pipes module(s) greatly improve robustness, network efficiency and scalability. These modules allow developers to specify a data source (local file share, S3, GCS) and a target (local file share, S3, Apache Solr, OpenSearch) in a tika-config.xml file, and then at parse time, developers only have to send a path/key to tika-server, and it will grab the bytes, safely parse the file and emit the parsed data to the specified target.

While attendance at the first tika-pipes meetup is not required, attendees will be expected to have worked through some of the examples from the earlier workshop (see the link below for materials).

Attendees should be comfortable running tika-server with a configuration file. See the link below for prerequisites (still a work in progress).

https://cwiki.apache.org/confluence/display/TIKA/Apache+Tika+Meetups

Photo of Apache Tika Community (Virtual) group
Apache Tika Community (Virtual)
See more events