Skip to content

Details

In this hands-on workshop, you'll use FiftyOne and the High Quality Invoice Images for OCR dataset to run the full data-centric loop end-to-end: embed invoices with a modern visual document model, cluster them by structure, run LightOnOCR as your base model, and use per-sample evaluation scores layered onto embedding space to find *where* and *why* it fails.

Time, Date and Location

Sep 02, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

What You'll Walk Away With

  • A working FiftyOne pipeline for any document collection you own
  • A repeatable curation query that combines evaluation + embedding signals
  • A fine-tuned LightOnOCR checkpoint that demonstrably outperforms the base model on your invoices
  • The mental model that data curation — not architecture or hyperparameters — is the highest-leverage thing you can do to improve a document AI system

Related topics

Artificial Intelligence
Computer Vision
Augmented Reality
Open Source

Sponsors

Voxel51

Voxel51

Administration, promotion, giveaways and charitable contributions.

Voxel51

Voxel51

Administration, promotion, giveaways, charitable contributions.

You may also like