Skip to content

Details

This two-part workshop covers building agents that read documents, extract text via OCR, and run topic modeling with NVIDIA Nemotron and RAPIDS. Both sessions start with concepts and end with code you can experiment with. No multimodal pipeline experience required.

Build multimodal extraction pipelines with Nemotron 3 Nano Omni and Nemotron Parse through NVIDIA NIMs. Turn charts, tables, screenshots, scanned documents, and screen recordings into structured artifacts for AI agents.

In this workshop you will learn how to:

  1. Call Nemotron Parse and Nemotron 3 Nano Omni from Python through NIM endpoints
  2. Build a document pipeline with OCR text, bounding boxes, tables, visual descriptions, and page context
  3. Wrap Python logic as a LangGraph tool and connect model output to validated tool execution

Prerequisites:
Python 3 fundamentals, exploratory data analysis workflows, and basic microservice concepts.

Recommended Resources:
NVIDIA NIMs intro
NVIDIA RAPIDS intro
LangChain Academy's Intro to LangGraph

Related topics

Artificial Intelligence
Computer Vision
Machine Learning
Natural Language Processing
Image Processing

You may also like