Skip to content

Details

In this hands-on workshop, you'll use FiftyOne and the High Quality Invoice Images for OCR dataset to run the full data-centric loop end-to-end: embed invoices with a modern visual document model, cluster them by structure, run LightOnOCR as your base model, and use per-sample evaluation scores layered onto embedding space to find *where* and *why* it fails.

Time, Date and Location

Sep 02, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

What You'll Walk Away With

  • A working FiftyOne pipeline for any document collection you own
  • A repeatable curation query that combines evaluation + embedding signals
  • A fine-tuned LightOnOCR checkpoint that demonstrably outperforms the base model on your invoices
  • The mental model that data curation — not architecture or hyperparameters — is the highest-leverage thing you can do to improve a document AI system

Related topics

Artificial Intelligence
Computer Vision
Machine Learning

You may also like