Skip to content

Details

Our third stream in the Python + AI series is all about vision models!

Vision models are LLMs that can accept both text and images, like GPT 4o and 4o-mini. You can use those models for image captioning, data extraction, question-answering, classification, and more!

We'll use Python to send images to vision models, build a basic chat-on-images app, and build a multimodal search engine.

This session is a part of a series! To learn more, click here

Pre-requisites:
If you'd like to follow along with the live examples, make sure you've got a GitHub account.

Habla español? Tendremos una serie para hispanohablantes!

Members are also interested in