Part of AI, Machine Learning and Computer Vision Meetup Network - 52 groups

Seattle AI, Machine Learning and Computer Vision Meetup

4.7•51 ratings

About us

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Send me a DM on Linkedin

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events

See all

Network event
Aug 4 - Visual AI in Manufacturing
Tue, Aug 4 · 9:00 AM PDT
·
Online
Online
204 attendees from 52 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics at the intersection of manufacturing, AI, ML, and computer vision.

Date, Time and Location

Aug 04, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Enabling Multimodal Agents on the Edge

The next generation of AI agents is moving beyond cloud-based text-only models and will interact with the physical multimodal world in real-time. For example in the vision domain, AI agents rely on Vision-Language Models (VLMs) in their backbone. However, deploying massive VLMs with billions of parameters on the edge devices remains a significant engineering hurdle.

Drawing on our recent ICML and CVPR research papers, this session explores advancements in agentic model optimizations, specifically how distillation and pruning transform 'heavyweight' models into lean, edge-ready engines. Lastly, I present our UI agent running on the actual phone that is being developed by our lab's team.

About the Speaker

Denis Gudovskiy is a Distinguished AI Engineer at Panasonic North America where he conducts R&D activities of various core AI methods, including multimodal and hardware-efficient agents, supervised and RL training pipelines, and robustness to out-of-distribution scenarios.

When the Camera Can’t Be Trusted: Health-Aware Visual AI for Reliable Near-Miss Detection

Near-miss detection systems are often evaluated as though every camera frame is equally trustworthy, even though blur, poor exposure, occlusion, contamination, and changing lighting can silently degrade the visual evidence used to make safety decisions. This talk presents an online camera-health framework that estimates visual reliability before downstream perception performance significantly deteriorates.

I will discuss how camera-health signals can support condition-aware evaluation, prioritize human review, reduce unreliable alerts, and trigger appropriate fallback behavior. Drawing from research in safety-critical visual perception, the talk will demonstrate how these principles can be adapted to industrial video systems operating across different cameras, shifts, layouts, and environmental conditions.

The presentation will also connect camera-health monitoring with rare-event discovery and failure-driven dataset improvement for more trustworthy near-miss detection.

About the Speaker

Shiva Aher is a computer vision researcher with a graduate background in computer science from the Georgia Institute of Technology, specializing in artificial intelligence.

Agentic VLM applications in manufacturing

Vision Language Models (VLMs) introduce net-new functionality to vision workloads in manufacturing that traditional computer vision models simply do not offer (e.g., open-vocabulary detection, in-context-learning). Even so, fine-tuned models like YOLO offer a level of precision and recall that today's VLMs struggle to match out-of-the-box.

Through agentic harnesses that coordinate calls to VLMs, we can start to deliver similar reliability on manufacturing-relevant tasks (e.g., many-class, many-instance detection), while also supporting the net new functionalities (e.g., multimodal search) that make VLMs distinct. In this talk, we walk through the design of these harnesses, how you serve them efficiently, and how they deliver value in manufacturing.

About the Speaker

Subraiz Ahmed is a member of the Technical Staff at Perceptron AI. He builds the infrastructure to serve frontier vision models. He previously founded a series of startups.
2 attendees from this group
Network event
Aug 6 - Audio and AI Meetup
Thu, Aug 6 · 9:00 AM PDT
·
Online
Online
209 attendees from 51 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Aug 06, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Do Speech Models Actually Understand Speech? Evaluating Speech LLMs Under Realistic Spoken Instruction Conditions

Speech Large Language Models (SLLMs) are increasingly capable; but are we evaluating them the right way? Most benchmarks rely on text prompts, yet real users interact with these systems through speech, a modality that introduces noise, disfluencies, and stylistic variation that text simply doesn't capture.
In this talk, we present findings from a systematic study across 11 tasks, 12 languages, and five prompt styles, examining how prompt modality, language, and task type shape SLLM performance.

About the Speaker

Maike Züfle is a PhD student at the Karlsruhe Institute of Technology (KIT), working in Prof. Jan Niehues's group on interactive speech systems for more natural human–machine communication. Her research focuses on instruction-following speech models with speech as both input and output, with a recent emphasis on full-duplex systems. Beyond her research, she co-organises the instruction-following and speech translation metrics shared tasks at IWSLT. She is a 2026 Apple Scholar in AI/ML.

AI based Audio Forensics

In this presentation, attendees will discover several modules developed by Gradiant for the detection and analysis of synthetically generated or manipulated audio. The session will be delivered by one of the developers involved in the design and implementation of these technologies, providing first-hand insight into their capabilities and underlying methodology.

The presentation will cover the traceability module, which helps identify the origin of AI-generated content. It will also cover the segment detection tool, designed to locate manipulated regions within an audio recording, as well as the complete audio detection tool, which assesses whether an entire recording has been synthetically generated.

About the Speaker

Daniel Paniagua Ares is a research engineer at Gradiant. Graduated in computer engineering from the FIC and with a master's degree in AI from the VIU.

Curating, Searching, and Evaluating Audio Datasets in FiftyOne

In this talk, we'll start with the ESC-50 environmental-sound dataset to show how FiftyOne represents audio: browsing clips in the tabular view, rendering spectrograms directly in the sample grid with a custom renderer, and turning sounds into searchable vectors with CLAP embeddings. Then we'll demo a similarity-search panel that lets you query an entire audio collection by example clip or a natural-language prompt to quickly find matching sounds.

We'll conclude with a live research problem: Audio Moment Retrieval from the DCASE 2026 Challenge, where the goal is to localize the exact moment in a long recording that matches a text query. We'll frame this as temporal detection, evaluate predictions, and visualize ground-truth vs. predicted moments on an interactive timeline to intuitively expose model failure modes.

Attendees will leave with a concrete blueprint and open code for applying visual data-centric AI practices to their own audio and multimodal datasets.

About the Speaker

John Duncan is a Machine Learning Engineer, Customer Success at Voxel51. His research interests include vision, LiDAR, and audio perception for robots and intelligent systems.

Real-Time ASR at 4x on Consumer Hardware: The Meetily Architecture

This talk covers the engineering behind Meetily, an open-source meeting assistant that runs Whisper and NVIDIA Parakeet transcription entirely on-device. We'll walk through how we got Parakeet to roughly 4x real-time on consumer hardware, and the specific points where it still falls over.

We'll also get into the honest trade-offs between local and cloud inference: latency, accuracy, cost, and what you actually give up by choosing one over the other. Wrapping ML inference in a Rust/Tauri desktop app came with its own costs, which we'll unpack as well.

Finally, we'll look at what "fully local" really means at an architecture level, where that boundary sits, and how easily it leaks once you add model downloads, integrations, or a pluggable LLM backend.

About the Speaker

Sandeep Zachariah is the Founder and CEO of Zackriya Solutions and the leads the team behind Meetily, an open-source, privacy-first meeting assistant that runs Whisper and NVIDIA Parakeet transcription entirely on-device. He brings a rare full-stack perspective on audio AI — from low-level embedded systems and hardware acceleration up through real-time ASR and local LLM summarization — with deep experience deploying speech and ML models across servers, GPUs and consumer hardware.
1 attendee from this group
Network event
Aug 6 - Audio and AI Meetup
Thu, Aug 6 · 9:00 AM PDT
·
Online
Online
11 attendees from 52 groups
Join us on Aug 6 for a special edition of the AI, ML, and Computer Vision Meetup focused on audio use cases!

Date, Time and Location

Aug 06, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom

Do Speech Models Actually Understand Speech? Evaluating Speech LLMs Under Realistic Spoken Instruction Conditions

Speech Large Language Models (SLLMs) are increasingly capable; but are we evaluating them the right way? Most benchmarks rely on text prompts, yet real users interact with these systems through speech, a modality that introduces noise, disfluencies, and stylistic variation that text simply doesn't capture.
In this talk, we present findings from a systematic study across 11 tasks, 12 languages, and five prompt styles, examining how prompt modality, language, and task type shape SLLM performance.

About the Speaker

Maike Züfle is a PhD student at the Karlsruhe Institute of Technology (KIT), working in Prof. Jan Niehues's group on interactive speech systems for more natural human–machine communication.

AI based Audio Forensics

In this presentation, attendees will discover several modules developed by Gradiant for the detection and analysis of synthetically generated or manipulated audio. The session will be delivered by one of the developers involved in the design and implementation of these technologies, providing first-hand insight into their capabilities and underlying methodology.

The presentation will cover the traceability module, which helps identify the origin of AI-generated content. It will also cover the segment detection tool, designed to locate manipulated regions within an audio recording, as well as the complete audio detection tool, which assesses whether an entire recording has been synthetically generated.

About the Speaker

Daniel Paniagua Ares is a research engineer at Gradiant. Graduated in computer engineering from the FIC and with a master's degree in AI from the VIU.

Curating, Searching, and Evaluating Audio Datasets in FiftyOne

In this talk, we'll start with the ESC-50 environmental-sound dataset to show how FiftyOne represents audio: browsing clips in the tabular view, rendering spectrograms directly in the sample grid with a custom renderer, and turning sounds into searchable vectors with CLAP embeddings. Then we'll demo a similarity-search panel that lets you query an entire audio collection by example clip or a natural-language prompt to quickly find matching sounds.

We'll conclude with a live research problem: Audio Moment Retrieval from the DCASE 2026 Challenge, where the goal is to localize the exact moment in a long recording that matches a text query. We'll frame this as temporal detection, evaluate predictions, and visualize ground-truth vs. predicted moments on an interactive timeline to intuitively expose model failure modes.

Attendees will leave with a concrete blueprint and open code for applying visual data-centric AI practices to their own audio and multimodal datasets.

About the Speaker

John Duncan is a Machine Learning Engineer, Customer Success at Voxel51. His research interests include vision, LiDAR, and audio perception for robots and intelligent systems.
Network event
Aug 11 - Debugging Physical AI Models at Scale with Multimodal Data Workshop
Tue, Aug 11 · 9:00 AM PDT
·
Online
Online
135 attendees from 52 groups
Join Voxel51 for a live workshop on how multimodal data workflows in FiftyOne help teams inspect, search, and debug complex Physical AI datasets and explain black-box model behavior at scale. We’ll show how teams can work with synchronized video and sensor data, query for similar scenarios across their datasets, and uncover patterns behind model failures faster than playback-only visualization tools allow.

Date, Time and Location

Aug 11, 2026
9:00 AM - 10:00 AM PST
Online. Register for the Zoom!

As robotics and autonomous vehicle teams move from traditional perception models to end-to-end Physical AI systems, understanding model behavior is becoming harder than ever. These models ingest synchronized inputs from cameras, sensors, and other data streams, but their decisions can be difficult to explain, reproduce, and improve.

You’ll learn how to use multimodal data to investigate questions like: when did the model swerve, miss an object, misinterpret a scene, or behave unexpectedly — and how can you find every similar moment across your dataset?

Designed for robotics, AV, and machine learning teams, this session will show how FiftyOne helps turn multimodal data into a scalable workflow for model evaluation, debugging, and improvement.