Name: Understanding Multimodal AI
Start: 2024-05-16T19:00:00-04:00
End: 2024-05-16T21:00:00-04:00

Join **NYC AI from Scratch** for our next speaker series event! This time we will venture beyond the hype of LLMs to explore how AI systems can integrate data from multiple modalities (like language, vision, or audio) to make them more powerful -- and to more closely resemble what the human brain does everyday.

If you want to learn how these cutting-edge models work and stimulate your imagination about possible AI applications of the future, don't miss out on this event!

Networking with fellow AI enthusiasts will follow the talk. (And there will be snacks.)

*Talk Description*
***Understanding Multi-Modal AI Models***
This talk will take a dive into the core theories and mechanisms behind multi-modal AI models, the powerful AI systems that integrate intelligence from diverse data sources such as text, AI, and vision. While we will reference popular applications like GPT Vision, Gemini, and Sora as illustrative examples, our focus will primarily be on understanding the foundational principles that enable capabilities such as real-time image/video analysis, hyper-segmentation of images, and image-to-text extraction among others.

Daniel

NYC AI from Scratch

epistratagem

Technology

Artificial Intelligence

Open Source

Machine Learning

Artificial Intelligence Machine Learning Robotics

AI/ML

Entrepreneurship

Business Mastermind

AI Algorithms

**Shafik Quoraishee** currently works for the New York Times as A Games Engineer, working on supporting Games like Connections, NY Times Crosswords, and Wordle. Before that he worked for various news and media organizations such as the NBA and Business Insider, with a focus on AI and ML research on computer vision and multi-sensor detection systems.

Shafik Quoraishee

Understanding Multimodal AI

151 W 30th St 3rd floor 3rd floor

Share

NYC AI from Scratch

Understanding Multimodal AI

NYC AI from Scratch

Details

Related topics

You may also like