Skip to content

Details

Join NYC AI from Scratch for our next speaker series event! This time we will venture beyond the hype of LLMs to explore how AI systems can integrate data from multiple modalities (like language, vision, or audio) to make them more powerful -- and to more closely resemble what the human brain does everyday.

If you want to learn how these cutting-edge models work and stimulate your imagination about possible AI applications of the future, don't miss out on this event!

Networking with fellow AI enthusiasts will follow the talk. (And there will be snacks.)

Talk Description
Understanding Multi-Modal AI Models
This talk will take a dive into the core theories and mechanisms behind multi-modal AI models, the powerful AI systems that integrate intelligence from diverse data sources such as text, AI, and vision. While we will reference popular applications like GPT Vision, Gemini, and Sora as illustrative examples, our focus will primarily be on understanding the foundational principles that enable capabilities such as real-time image/video analysis, hyper-segmentation of images, and image-to-text extraction among others.

Related topics

Events in New York, NY
AI/ML
Artificial Intelligence
Artificial Intelligence Machine Learning Robotics
Machine Learning
Open Source

You may also like