Skip to content

Seeing is Believing: A Hands-On Tour of Vision-Language Models

Network event
34 attendees from 3 groups hosting
Photo of Raj Marri
Hosted By
Raj M.
Seeing is Believing: A Hands-On Tour of Vision-Language Models

Details

Vision-Language Models (VLMs) are transforming how AI sees, understands, and explains the world. From image captioning to multimodal reasoning, they power the next wave of intelligent applications.
In this live session, we’ll:

  • Break down the fundamentals of how Vision-Language Models process both text and images
  • Showcase lightweight VLMs (SmolVLM, MiniCPM-o, Qwen-VL, etc.) that can run on modest hardware
  • Demonstrate real-time examples: image captioning, visual Q&A, and multimodal retrieval
  • Compare open-source VLMs with larger commercial ones, and discuss where small models shine
  • Share tips on deploying VLMs for startups, apps, and research projects

🔹 Format: Demo-driven walkthrough + interactive Q&A
🔹 Who’s it for: AI engineers, product managers, researchers, and builders curious about multimodal AI
🔹 Takeaway: A working understanding of VLMs, access to demo notebooks, and ideas for real-world applications

Photo of New Jersey Artificial Intelligence Meetup Group group
New Jersey Artificial Intelligence Meetup Group
See more events
FREE