Hands‑On: Build a Vision‑Powered AI Agent in Python


Details
Details
Join us at PyData Huddersfield for a hands‑on session where we’ll build a Python AI agent that interprets and responds to visual input. Using accessible open‑source tools, you’ll design an end‑to‑end system that can “see” images from a webcam feed, screenshot, or upload and return useful natural‑language output.
We’ll cover: capturing image data; extracting signals (objects, text/OCR, labels); and triggering data‑aware responses from an LLM or automation step. You’ll get hands‑on with libraries such as OpenCV, Hugging Face vision models, and lightweight orchestration patterns.
Drawing on Mujadded’s work in railway safety and logistics automation, we’ll look at hazard detection sketches, image‑tag workflows, and whiteboard capture. You’ll leave with runnable starter code and a clear path to adding perception to your own AI agents.

Sponsors
Hands‑On: Build a Vision‑Powered AI Agent in Python