Building Multimodal Agents With NVIDIA Nemotron and RAPIDS - Part One

Name: Building Multimodal Agents With NVIDIA Nemotron and RAPIDS - Part One
Start: 2026-06-30T18:00:00+02:00
End: 2026-06-30T21:00:00+02:00

Hosted by Antonio Rueda T.

Super Organizer

Berlin Computer Vision Group

Details

This two-part workshop covers building agents that read documents, extract text via OCR, and run topic modeling with NVIDIA Nemotron and RAPIDS. Both sessions start with concepts and end with code you can experiment with. No multimodal pipeline experience required.

Build multimodal extraction pipelines with Nemotron 3 Nano Omni and Nemotron Parse through NVIDIA NIMs. Turn charts, tables, screenshots, scanned documents, and screen recordings into structured artifacts for AI agents.

In this workshop you will learn how to:

Call Nemotron Parse and Nemotron 3 Nano Omni from Python through NIM endpoints
Build a document pipeline with OCR text, bounding boxes, tables, visual descriptions, and page context
Wrap Python logic as a LangGraph tool and connect model output to validated tool execution

Prerequisites:
Python 3 fundamentals, exploratory data analysis workflows, and basic microservice concepts.

Berlin Computer Vision Group

Building Multimodal Agents With NVIDIA Nemotron and RAPIDS - Part One

Berlin Computer Vision Group

Details

Related topics

You may also like