Part of Computer Vision Meetups - 16 groups

Ann Arbor AI, Machine Learning and Computer Vision Meetup

4.5•26 ratings

Ann Arbor, MI, US

626 members · Public group

What we’re about

🖖 This group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Send me a DM on Linkedin

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit theFiftyOne project page on GitHub.

Upcoming events

See all

Network event
Wed, Nov 5 · 12:00 PM EST
•
Online
Physical AI Data Pipelines with NVIDIA Omniverse NuRec, Cosmos and FiftyOne
Online
68 attendees from 16 groups
Join Voxel51 and NVIDIA as they unveil a breakthrough that’s changing how Physical AI systems are built. In this first-ever demo featuring NVIDIA Omniverse NuRec and NVIDIA Cosmos with FiftyOne, you’ll learn how to create validated, simulation-ready data pipelines—cutting testing costs, eliminating manual data audits, and accelerating development from months to days.

Date and Location

Nov 5, 2025
9:00-10:30 AM Pacific
Online. Register for the Zoom

Developing autonomous vehicles and humanoid robots requires rigorous simulations that capture real-world complexity. The critical barrier that keeps teams from achieving success isn’t the simulation engine itself, but the data that powers it.

As Physical AI systems ingest petabytes of multisensor data, converting this raw input into validated, simulation-ready data pipelines remains a hidden bottleneck. A camera-to-LiDAR projection off by a few pixels, timestamps misaligned by a few milliseconds, or inaccurate coordinate systems will cascade into flawed neural reconstructions and synthetic data.

Without a well-orchestrated data pipeline, even the most advanced simulation platforms end up consuming imperfect data, wasting weeks of effort and thousands of dollars in testing and compute costs.

In a first-ever demo featuring NVIDIA Omniverse NuRec and NVIDIA Cosmos with FiftyOne, you’ll discover how to:
- Eliminate manual data audits with an automated workflow that calibrates, aligns, and ensures data integrity across cameras, LiDAR, radar, and other sensors
- Curate and enrich the data for neural reconstructions and synthetic data generation
- Reduce Physical AI testing and QA costs by up to 80%
- Accelerate Physical AI development from months to days
Who should attend:
- Data Engineers, MLOps & ML Engineers working with Physical AI data
- Technical leaders and Managers driving Physical AI projects from prototype to production
- AV/Robotics Researchers building safety-critical apps with cutting-edge tech
- Product & Strategy leaders seeking to accelerate development while reducing infra costs and risks.
About the Speakers

Itai H Zadok is a Senior Product Manager l Autonomous Vehicles Simulation at NVIDIA

Daniel Gural is a Machine Learning Engineer and Evangelist at Voxel51
5 attendees from this group
Network event
Thu, Nov 6 · 12:00 PM EST
•
Online
Nov 6 - Visual Document AI: Because a Pixel is Worth a Thousand Tokens
Online
14 attendees from 16 groups
Join us for a virtual event to hear talks from experts on the latest developments in Visual Document AI.

Date and Location

Nov 6, 2025
9-11 AM Pacific
Online. Register for the Zoom!

Document AI: A Review of the Latest Models, Tasks and Tools

In this talk, go through everything document AI: trends, models, tasks, tools! By the end of this talk you will be able to get to building apps based on document models

About the Speaker

Merve Noyan works on multimodal AI and computer vision at Hugging Face, and she's the author of the book Vision Language Models on O'Reilly.

Run Document VLMs in Voxel51 with the VLM Run Plugin — PDF to JSON in Seconds

The new VLM Run Plugin for Voxel51 enables seamless execution of document vision-language models directly within the Voxel51 environment. This integration transforms complex document workflows — from PDFs and scanned forms to reports — into structured JSON outputs in seconds. By treating documents as images, our approach remains general, scalable, and compatible with any visual model architecture. The plugin connects visual data curation with model inference, empowering teams to run, visualize, and evaluate document understanding models effortlessly. Document AI is now faster, reproducible, and natively integrated into your Voxel51 workflows.

About the Speaker

Dinesh Reddy is a founding team member of VLM Run, where he is helping nurture the platform from a sapling into a robust ecosystem for running and evaluating vision-language models across modalities. Previously, he was a scientist at Amazon AWS AI, working on large-scale machine learning systems for intelligent document understanding and visual AI. He completed his Ph.D. at the Robotics Institute, Carnegie Mellon University, focusing on combining learning-based methods with 3D computer vision for in-the-wild data. His research has been recognized with the Best Paper Award at IEEE IVS 2021 and fellowships from Amazon Go and Qualcomm.

CommonForms: Automatically Making PDFs Fillable

Converting static PDFs into fillable forms remains a surprisingly difficult task, even with the best commercial tools available today. We show that with careful dataset curation and model tuning, it is possible to train high-quality form field detectors for under $500. As part of this effort, we introduce CommonForms, a large-scale dataset of nearly half a million curated form images. We also release a family of highly accurate form field detectors, FFDNet-S and FFDNet-L.

About the Speaker

Joe Barrow is a researcher at Pattern Data, specializing in document AI and information extraction. He previously worked at the Adobe Document Intelligence Lab after receiving his PhD from the University of Maryland in 2022.

Visual Document Retrieval: How to Cluster, Search and Uncover Biases in Document Image Datasets Using Embeddings

In this talk you'll learn about the task of visual document retrieval, the models which are widely used by the community, and see them in action through the open source FiftyOne App where you'll learn how to use these models to identify groups and clusters of documents, find unique documents, uncover biases in your visual document dataset, and search over your document corpus using natural language.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.
Network event
Thu, Nov 13 · 12:00 PM EST
•
Online
Nov 13 - Women in AI
Online
31 attendees from 15 groups
Hear talks from experts on the latest topics in AI, ML, and computer vision on November 13.

Date and Location

Nov 13, 2025
9 AM Pacific
Online. Register for the Zoom!

Copy, Paste, Customize! The Template Approach to AI Engineering

Most AI implementations fail because teams treat prompt engineering as ad-hoc experimentation rather than systematic software engineering, leading to unreliable systems that don't scale beyond proof-of-concepts. This talk demonstrates engineering practices that enable reliable AI deployment through standardized prompt templates, systematic validation frameworks, and production observability.

Drawing from experience developing fillable prompt templates currently being validated in production environments processing thousands of submissions, I'll share how Infrastructure as Code principles apply to LLM workflows, why evaluation metrics like BLEU scores are critical for production reliability, and how systematic failure analysis prevents costly deployment issues. Attendees will walk away with understanding of practical frameworks for improving AI system reliability and specific strategies for building more consistent, scalable AI implementations.

About the Speaker

Jeanne McClure is a postdoctoral scholar at NC State's Data Science and AI Academy with expertise in systematic AI implementation and validation. Her research transforms experimental AI tools into reliable production systems through standardized prompt templates, rigorous testing frameworks, and systematic failure analysis. She holds a PhD in Learning, Design and Technology with additional graduate work in data science.

Multimodality with Biases: Understand and Evaluate VLMs for Autonomous Driving with FiftyOne

Do your VLMs really see danger? With FiftyOne, I’ll show you how to understand and evaluate vision-language models for autonomous driving — making risk and bias visible in seconds. We’ll compare models on the same scenes, reveal failures and edge cases, and you’ll see a simple dashboard to decide which data to curate and what to adjust. You’ll leave with a clear, practical, and replicable method to raise the bar for safety.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

The Heart of Innovation: Women, AI, and the Future of Healthcare

This session explores how Artificial Intelligence is transforming healthcare by enhancing diagnosis, treatment, and patient outcomes. It highlights the importance of diverse and female perspectives in shaping AI solutions that are ethical, empathetic, and human-centered. We will discuss key applications, current challenges, and the future potential of AI in medicine. It’s a forward-looking conversation about how innovation can build a healthier world.

About the Speaker

Karen Sanchez is a Postdoctoral Researcher at the Center of Excellence for Generative AI at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. Her research focuses on AI for Science, spanning computer vision, video understanding, and privacy-preserving machine learning. She is also an active advocate for diversity and outreach in AI, contributing to global initiatives that connect researchers and amplify underrepresented voices in technology.

Language Diffusion Models

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). Challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens.

Optimizing a likelihood bound provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue.

About the Speaker

Jayita Bhattacharyya is an AI/ML Nerd with a blend of technical speaking & hackathon wizardry! Applying tech to solve real-world problems. The work focus these days is on generative AI. Helping software teams incorporate AI into transforming software engineering.
3 attendees from this group
Network event
Fri, Nov 14 · 12:00 PM EST
•
Online
Nov 14 - Workshop: Document Visual AI with FiftyOne
Online
18 attendees from 16 groups
This hands-on workshop introduces you to document visual AI workflows using FiftyOne, the leading open-source toolkit for computer vision datasets.

Date and Location

Nov 14, 2025
9:00-10:30 AM Pacific
Online. Register for the Zoom

In document understanding, a pixel is worth a thousand tokens. While traditional text-extraction pipelines tokenize and process documents sequentially, modern visual AI approaches can understand document structure, layout, and content directly from images—making them more efficient, accurate, and robust to diverse document formats.

In this workshop you'll learn how to:
- Load and organize document datasets in FiftyOne for visual exploration and analysis
- Compute visual embeddings using state-of-the-art document retrieval models to enable semantic search and similarity analysis
- Leverage FiftyOne workflows including similarity search, clustering, and quality assessment to gain insights from your document collections
- Deploy modern vision-language models for OCR and document understanding tasks that go beyond simple text extraction
- Evaluate and compare different OCR models to select the best approach for your specific use case
Whether you're working with invoices, receipts, forms, scientific papers, or mixed document types, this workshop will equip you with practical skills to build robust document AI pipelines that harness the power of visual understanding. Walk away with reproducible notebooks and best practices for tackling real-world document intelligence challenges.
2 attendees from this group