Skip to content

About us

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of computer vision.

  • Are you interested in speaking at a future Meetup?
  • Is your company interested in sponsoring a Meetup?

Contact the Meetup organizers!

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone

📣 Past Speakers

* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr Mßnchen
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nßrnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Łukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT

📚 Resources

* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links

Sponsors

Voxel51

Voxel51

Administration, promotion, giveaways and charitable contributions.

Voxel51

Voxel51

Administration, promotion, giveaways and charitable contributions.

Upcoming events

14

See all
  • Feb 12 - Seattle AI, ML and Computer Vision Meetup

    Feb 12 - Seattle AI, ML and Computer Vision Meetup

    Location not specified yet

    Join us to hear talks from experts on cutting-edge topics across AI, ML, and computer vision!

    Pre-registration is mandatory.

    Time and Location

    Feb 12, 2026
    5:30 - 8:30 PM

    Union AI Offices
    400 112th Ave NE #115
    Bellevue, WA 98004

    ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

    In the complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for a multi-modal LLM (MLLM)-based video anomaly detection (VAD) system to succeed. In this talk, I will introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process.

    About the Speaker

    Congjing Zhang is a third-year Ph.D. student in the Department of Industrial and Systems Engineering at the University of Washington, advised by Prof. Shuai Huang. She is a recipient of the 2025-2027 Amazon AI Ph.D. Fellowship. Her research interests center on large language models (LLMs) and machine learning, with a focus on uncertainty quantification, anomaly detection and synthetic data generation.

    The World of World Models: How the New Generation of AI Is Reshaping Robotics and Autonomous Vehicles

    World Models are emerging as the defining paradigm for the next decade of robotics and autonomous systems. Instead of depending on handcrafted perception stacks or rigid planning pipelines, modern world models learn a unified representation of an environment—geometry, dynamics, semantics, and agent behavior—and use that understanding to predict, plan, and act. This talk will break down why the field is shifting toward these holistic models, what new capabilities they unlock, and how they are already transforming AV and robotics research.

    We then connect these advances to the Physical AI Workbench, a practical foundation for teams who want to build, validate, and iterate on world-model-driven pipelines. The Workbench standardizes data quality, reconstruction, and enrichment workflows so that teams can trust their sensor data, generate high-fidelity world representations, and feed consistent inputs into next-generation predictive and generative models. Together, world models and the Physical AI Workbench represent a new, more scalable path forward—one where robots and AVs can learn, simulate, and reason about the world through shared, high-quality physical context.

    About the Speaker

    Daniel Gural leads technical partnerships at Voxel51, where he’s building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.

    Modern Orchestration for Durable AI Pipelines and Agents - Flyte 2.0

    In this talk we’ll discuss how the orchestration space is evolving with the current AI landscape, and provide a peak at Flyte 2.0, which makes truly dynamic, compute aware, and durable AI orchestration easy for any type of AI application, from computer vision, agents, and more!

    Flyte, the open source orchestration platform, is already being used by thousands of teams to build their AI pipelines. In-fact it’s extremely likely you’ve interacted with AI models trained on Flyte, while on social media, listening to music on using self driving technologies.

    About the Speaker

    Sage Elliott is an AI Engineer at Union.ai (core maintainers of Flyte).

    Context Engineering for Video Intelligence: Beyond Model Scale to Real-World Impact

    Video streams combine vision, audio, time-series and semantics at a scale and complexity unlike text alone. At TwelveLabs, we’ve found that tackling this challenge doesn’t start with ever-bigger models — it starts with engineering the right context. In this session, we’ll walk engineers and infrastructure leads through how to build production-grade video AI by systematically designing what information the model receives, how it's selected, compressed, and isolated. You’ll learn our four pillars of video context engineering (Write, Select, Compress, Isolate), see how our foundation models (Marengo & Pegasus) and agent product (Jockey) use them, and review real-world outcomes in media, public-safety and advertising pipelines.

    We’ll also dive into how you measure context effectiveness — tokens per minute, retrieval hit rates, versioned context pipelines — and how this insight drives cost, latency and trust improvements. If you’re deploying AI video solutions in the wild, you’ll leave with a blueprint for turning raw video into deployable insight — not by model size alone, but by targeted context engineering.

    About the Speaker

    James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem.

    Build Reliable AI apps with Observability, Validations and Evaluations

    As generative AI moves from experimentation to enterprise deployment, reliability becomes critical. This session outlines a strategic approach to building robust AI apps using Monocle for observability and the VS Code Extension for diagnostics, and bug fixing. Discover how to create AI systems that are not only innovative but also predictable and trustworthy.

    About the Speaker

    Hoc Phan has 20+ years of experience driving innovation at Microsoft, Amazon, Dell, and startups. In 2025, he joined Okahu to lead product and pre-sales, focusing on AI observability and LLM performance. Previously, he helped shape Microsoft Purview via the BlueTalon acquisition and led R&D in cybersecurity and data governance. Hoc is a frequent speaker and author of three books on mobile development and IoT.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    21 attendees
  • Network event
    Feb 18 - Feedback-Driven Annotation Pipelines for End-to-End ML Workflows

    Feb 18 - Feedback-Driven Annotation Pipelines for End-to-End ML Workflows

    ¡
    Online
    Online
    50 attendees from 16 groups

    In this technical workshop, we’ll show how to build a feedback-driven annotation pipeline for perception models using FiftyOne. We’ll explore real model failures and data gaps, and turn them into focused annotation tasks that then route through a repeatable workflow for labeling and QA. The result is an end-to-end pipeline keeping annotators, tools, and models aligned and closing the loop from annotation, curation, back to model training and evaluation.

    Time and Location

    Feb 18, 2026
    10 - 11 AM PST
    Online. Register for the Zoom!

    What you'll learn

    • Techniques for labeling the data that matters the most for annotation time and cost savings
    • Structure human-in-the-loop workflows for finding and fixing model errors, data gaps, and targeted relabeling instead of bulk labeling
    • Combine auto-labeling and human review in a single, feedback-driven pipeline for perception models
    • Use label schemas and metadata as “data contracts” to enforce consistency between annotators, models, and tools, especially for multimodal data
    • Detect and manage schema drift and tie schema versions to dataset and model versions for reproducibility
    • QA and review steps that surface label issues early and tie changes back to model behavior
    • An annotation architecture that can accommodate new perception tasks and feedback signals without rebuilding your entire data stack
    • Photo of the user
    • Photo of the user
    • Photo of the user
    4 attendees from this group
  • Network event
    Feb 26 - Exploring Video Datasets with FiftyOne and Vision-Language Models

    Feb 26 - Exploring Video Datasets with FiftyOne and Vision-Language Models

    ¡
    Online
    Online
    16 attendees from 16 groups

    Join Harpreet Sahota for a virtual workshop to learn how to use Facebook's Action100M dataset and FiftyOne to build an end-to-end workflow.

    Date, Time and Location

    Feb 26, 2026
    9am - 10am Pacific
    Online. Register for the Zoom!

    Video is the hardest modality to work with. You're dealing with more data, temporal complexity, and annotation workflows that don't scale. This hands-on workshop tackles a practical question: given a large video dataset, how do you understand what's in it without manually watching thousands of clips?

    In this workshop you'll learn how to:

    • Navigate and explore video data in the FiftyOne App, filter samples, and understand dataset structure
    • Compute embeddings with Qwen3-VL to enable semantic search, zero-shot classification, and clustering
    • Generate descriptions and localize events using vision-language models like Qwen3-VL and Molmo2
    • Visualize patterns in your data through embedding projections and the FiftyOne App
    • Evaluate model outputs against Action100M's hierarchical annotations to validate what the models actually capture

    By the end of the session, you'll have a reusable toolkit for understanding any video dataset at scale, whether you're curating training data, debugging model performance, or exploring a new domain.

  • Network event
    March 5 - AI, ML and Computer Vision Meetup

    March 5 - AI, ML and Computer Vision Meetup

    ¡
    Online
    Online
    24 attendees from 16 groups

    Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Date and Location

    Mar 5, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    MOSPA: Human Motion Generation Driven by Spatial Audio

    Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion.

    To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA can generate diverse, realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task.

    About the Speaker

    Zhiyang (Frank) Dou is a Ph.D. student at MIT CSAIL, advised by Prof. Wojciech Matusik. I work with the Computational Design and Fabrication Group and the Computer Graphics Group.

    Securing the Autonomous Future: Navigating the Intersection of Agentic AI, Connected Devices, and Cyber Resilience

    With billions of devices now in our infrastructure and emerging as autonomous agents (AI), we face a very real question: How can we create intelligent systems that are both secure and trusted? This talk will explore the intersection of agentic AI and IoT and demonstrate how the same AI systems can provide robust defense mechanisms. At its core, however, this is a challenge about trusting people with technology, ensuring their safety, and providing accountability. Therefore, creating a new way of thinking is required, one in which security is built in, and where autonomous action has oversight; and, ultimately, innovation leads to greater human well-being.

    About the Speaker

    Samaresh Kumar Singh is an engineering principal at HP Inc. with more than 21 years of experience in designing and implementing large-scale distributed systems, cloud native platform systems, and edge AI / ML systems. His expertise includes agentic AI systems, GenAI / LLMs, Edge AI, federated and privacy preserving learning, and secure hybrid cloud / edge computing.

    Plugins as Products: Bringing Visual AI Research into Real-World Workflows with FiftyOne

    Visual AI research often introduces new datasets, models, and analysis methods, but integrating these advances into everyday workflows can be challenging. FiftyOne is a data-centric platform designed to help teams explore, evaluate, and improve visual AI, and its plugin ecosystem is how the platform scales beyond the core. In this talk, we explore the FiftyOne plugin ecosystem from both perspectives: how users apply plugins to accelerate data-centric workflows, and how researchers and engineers can package their work as plugins to make it easier to share, reproduce, and build upon. Through practical examples, we show how plugins turn research artifacts into reusable components that integrate naturally into real-world visual AI workflows.

    About the Speaker

    Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

    Transforming Business with Agentic AI

    Agentic AI is reshaping business operations by employing autonomous systems that learn, adapt, and optimize processes independently of human input. This session examines the essential differences between traditional AI agents and Agentic AI, emphasizing their significance for project professionals overseeing digital transformation initiatives. Real-world examples from eCommerce, insurance, and healthcare illustrate how autonomous AI achieves measurable outcomes across industries. The session addresses practical orchestration patterns in which specialized AI agents collaborate to resolve complex business challenges and enhance operational efficiency. Attendees will receive a practical framework for identifying high-impact use cases, developing infrastructure, establishing governance, and scaling Agentic AI within their organizations.

    About the Speaker

    Joyjit Roy is a senior technology and program management leader with over 21 years of experience delivering enterprise digital transformation, cloud modernization, and applied AI programs across insurance, financial services, and global eCommerce.

Group links

Members

648
See all
Photo of the user Jimmy Guerrero
Photo of the user Mitch
Photo of the user Hector Gomez
Photo of the user Harish Pasupuleti
Photo of the user Sush Nayak
Photo of the user Scott Lease
Photo of the user Sage Elliott
Photo of the user Puneet
Photo of the user Lizette Lemus
Photo of the user Mta ethio
Photo of the user Raynier van Egmond
Photo of the user Ben