Visual Document Understanding w/Multi-Modal Image & Text Mining in Spark OCR 3

Miami Hadoop User Group
Miami Hadoop User Group
Public group

Online event

This event has passed

Details

Register Here: https://events.johnsnowlabs.com/visual-document-understanding-with-multi-modal-image-text-mining-in-spark-ocr-3

The Transformer architecture in NLP has truly changed the way we analyze text. NLP models are great at processing digital text, but many real-word applications use documents with more complex formats. For example, healthcare systems often include visual lab results, sequencing reports, clinical trial forms, and other scanned documents. When we only use an NLP approach for document understanding, we lose layout and style information - which can be vital for document image understanding. New advances in multi-modal learning allow models to learn from both the text in documents (via NLP) and visual layout (via computer vision).

We provide multi-modal visual document understanding, built on Spark OCR based on the LayoutLM architecture. It achieves new state-of-the-art accuracy in several downstream tasks, including form understanding (from 70.7 to 79.3), receipt understanding (from 94.0 to 95.2) and document image classification (from 93.1 to 94.4).

Register Here: https://events.johnsnowlabs.com/visual-document-understanding-with-multi-modal-image-text-mining-in-spark-ocr-3

Attendees (1)