Skip to content

About us

đź–– This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

  • Are you interested in speaking at a future Meetup?
  • Is your company interested in sponsoring a Meetup?

Send me a DM on Linkedin

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events

15

See all
  • Network event
    June 25 - AI, ML and Computer Vision Meetup

    June 25 - AI, ML and Computer Vision Meetup

    ·
    Online
    Online
    353 attendees from 48 groups

    Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Date, Time and Location

    Jun 25, 2026
    9AM PST
    Online.
    Register for the Zoom!

    Large-Scale Scene Reconstruction via Local View Transformers

    Transformer-based models have advanced 3D scene reconstruction, but their quadratic attention limits scalability to large scenes. We introduce the Local View Transformer (LVT), which replaces global attention with locality-aware attention over neighboring views, conditioned on relative camera geometry. LVT decodes directly into 3D Gaussian splats with view-dependent color and opacity for high-fidelity rendering. Our approach enables scalable, single-pass reconstruction of large, high-resolution scenes.

    About the Speaker

    Tooba Imtiaz is a PhD candidate in Electrical and Computer Engineering at Northeastern University, working in the Machine Learning Lab. Her research focuses on 3D computer vision, novel view synthesis, and robust machine learning. She has published in top venues including SIGGRAPH Asia, CVPR, and ICLR, and has industry experience at Google.

    Lessons learned from running AI workloads in production

    He’ll share his “tales from the engine room” - practical insights from operating AI systems at scale, including the challenges of abstraction layers, the realities of data movement and hardware constraints, and how systems thinking is essential for building high-performance, secure, and responsible AI infrastructure.

    About the Speaker

    Dave Hughes is CTO at Stelia. He was formerly CTO at Genesis Cloud, which pioneered what is now commonly known as 'neoclouds', and Principal Engineer/Interim Director of Engineering at Adjust GmbH where he built large-scale data warehousing and processing. Dave has a strong background in software engineering, data engineering, systems admin and network engineering. He has worked in traditional HPC, early GPU-accelerated computing (ML) and now AI.

    Enhancing Low-Field MRI with Deep Super-Resolution for Improved Nipah Virus Neuroimaging

    Advances in deep learning make very-low-field (VLF) MRI systems a viable alternative for in vivo neuroimaging. Zero-shot super-resolution, self-supervised learning, and generative AI were explored to improve the quality of low-field MRI images. We present a framework for the first deployment of a VLF scanner for imaging Nipah virus-inoculated nonhuman primates (NHPs) using a 0.05 T MRI system.

    First, a retrospective simulation study assessed the feasibility of imaging NiV infection at low field, followed by a prospective deployment (0.05 T) that enabled longitudinal imaging. The VLF-NiV imaging was characterized by low image quality and included multiple contrasts. A deep learning-based unpaired domain adaptation (CycleGAN) conditioned on acquisition parameters was used to harmonize contrast, and a simulation-based ResUNet model was used to reduce unwanted noise and preserve T2-weighted structural fidelity. We also highlight studies involving zero-shot super-resolution and denoising experiments that are advantageous for accessible neuroimaging.

    About the Speaker

    Ajay Sharma is a deep learning engineer with a broad background in biomedical image analysis. My research focuses on developing advanced deep learning methods for computer-aided disease detection and diagnosis. Currently, my work centers on improving image analysis in magnetic resonance imaging (MRI), with emphasis on low-field MRI (LF-MRI), image acquisition, image enhancement, brain tracking, segmentation, and reporting. Previously, I developed explainable AI (XAI) approaches for chest and pediatric brain imaging that increase clinicians’ confidence in AI-assisted diagnostic systems.

    And Now for Something Completely Different with FiftyOne

    Often the best way to understand what a tool is truly capable of, is to use in ways it was never intended to be used. This session pushes FiftyOne past its computer vision roots through a series of demos showing how to push the boundaries with FiftyOne. A few practical, some playful, all built with open source code. You'll see how FiftyOne's core building blocks generalize far beyond labeled datasets, and leave with patterns and ideas you can take in your own direction.

    About the Speaker

    Burhan Qaddoumi is a ML DevRel Engineer at Voxel51 and perpetual "new guy" as a life long learner. Active in communities all across the web, eager to help, learn, and share with others that demonstrate initiative, interest, and drive.

    • Photo of the user
    • Photo of the user
    3 attendees from this group
  • Network event
    June 30 - Beyond Annotation Tools: Building a Complete Physical AI Data Engine

    June 30 - Beyond Annotation Tools: Building a Complete Physical AI Data Engine

    ·
    Online
    Online
    134 attendees from 48 groups

    In this workshop we’ll demonstrate workflows for image and video annotation, instance segmentation, polylines, QA and review, collaborative labeling operations in FiftyOne, and smart data selection strategies that help teams reduce wasted labeling spend.

    Date, Time and Location

    Jun 30, 2026
    9 AM PST
    Online. Register for the Zoom!

    Annotation is no longer just about drawing boxes. Modern physical AI teams need an end-to-end system for labeling, QA, dataset curation, project management, auto-labeling, and video understanding — all tightly integrated into the workflows where models are actually built and evaluated.

    You’ll also get an early look at new agentic labeling workflows powered by “Labeling Agents” — intelligent systems that can learn from text prompts and visual examples to automatically label datasets at scale. We’ll walk through how teams can rapidly create reusable labeling agents, validate outputs, and apply them across large datasets as background tasks.

    Whether you’re building computer vision models for robotics, autonomous systems, manufacturing, retail, or multimodal AI applications, this session will show how integrated annotation and data-centric workflows can dramatically accelerate iteration speed while improving dataset quality.

    What You’ll Learn

    • How smart data selection strategies reduce annotation costs and improve model performance
    • Why integrated annotation is becoming a core requirement for modern physical AI platforms
    • How to unify data curation, annotation, evaluation, and model iteration inside a single workflow
    • How FiftyOne supports annotation workflows for Classification, Object detection, Instance segmentation, Polylines, Video detection and tracking
    • How to create, edit, QA, and manage 2D and 3D labels directly in context
    • How annotation project management workflows help coordinate labeling teams and reviews
    • How SAM2-powered click-to-segment workflows enable fast browser-based segmentation
    • How agentic labeling works, including training reusable “Labeling Agents”, prompting with text + visual examples, iterating on outputs before deployment and running large-scale auto-labeling workflows
    1 attendee from this group
  • Network event
    July 1 - Getting Started with FiftyOne

    July 1 - Getting Started with FiftyOne

    ·
    Online
    Online
    112 attendees from 48 groups

    This workshop is part of our Getting Started with FiftyOne monthly series — a recurring session designed to help you build a strong foundation in data-centric AI workflows.

    Time, Place and Location

    July 1, 2026
    9 AM PST - 10 AM PST
    Online.
    Register for the Zoom!

    In this session, you’ll learn how to manage large-scale computer vision datasets using open source FiftyOne. We’ll cover how to curate, visualize, and evaluate your data and models — with a focus on improving data quality over brute-force model iteration.

    You’ll walk away with a repeatable framework for building data-centric AI pipelines across research and production.

    What you’ll learn:

    • Structure unstructured data into queryable schemas (images, video, point clouds)
    • Query datasets using the FiftyOne SDK with filters, tags, and confidence thresholds
    • Visualize high-dimensional embeddings to identify clusters, gaps, and outliers
    • Automate data curation and prioritize high-value samples for labeling
    • Debug model performance using evaluation tools (confusion matrices, PR curves)
    • Customize FiftyOne with dashboards and interactive panels

    Prerequisites:

    • Working knowledge of Python
    • Familiarity with machine learning and/or computer vision fundamentals
    • Photo of the user
    • Photo of the user
    3 attendees from this group
  • Network event
    July 8 - Best of CVPR (Day 1)

    July 8 - Best of CVPR (Day 1)

    ·
    Online
    Online
    66 attendees from 48 groups

    Welcome to the Best of CVPR series — your virtual front row to groundbreaking research, insights, and innovations from one of computer vision's premier conferences. Live from the authors to you.

    Date, Time and Location

    Jul 08, 2026
    9 AM - 11 AM PT
    Online.
    Register for Zoom!

    Some Modalities Are More Equal Than Others: Understanding and Improving Multimodal Integration in MLLMs

    Multimodal large language models can process vision, audio, and text, but it remains unclear whether they truly integrate these modalities or rely on shortcut cues. In this talk, I will present our recent work, “Some Modalities Are More Equal Than Others,” where we introduce MMA-Bench, a benchmark designed to probe MLLMs under controlled audio–visual conflict, misleading text, and modality-specific queries. Through black-box evaluation and white-box attention analysis, we show that current MLLMs often struggle when modalities disagree, exhibit model-specific modality biases, and can be distracted by irrelevant textual context. We further propose an alignment-aware tuning strategy that trains models to answer based on the queried modality, improving robustness and multimodal grounding. This talk will highlight both the failure modes of current MLLMs and practical directions toward more reliable cross-modal reasoning.

    About the Speaker

    Tianle Chen is a Ph.D. student in Computer Science at Boston University, advised by Prof. Deepti Ghadiyaram. His research focuses on multimodal large language models, audio–visual reasoning, robustness, and trustworthy multimodal AI. He is interested in understanding how models allocate evidence across modalities and designing methods that improve reliable multimodal reasoning.

    LinkedOut: Linking World Knowledge Out of Video LLMs for Next-Generation Video Recommendation

    This CVPR 2026 work links structured world knowledge representations out of Video LLMs for next-generation video recommendation, covering how large vision-language models can provide rich semantic priors for video understanding while addressing efficiency and deployment challenges in real recommendation systems.

    About the Speaker

    Haichao Zhang is a Ph.D. candidate in Computer Engineering at Northeastern University. His research focuses on computer vision, vision-language models, video understanding and generation, and efficient multimodal foundation models. He has research experience at Google CoreML, Meta Reality Labs, LinkedIn Video AI, Amazon AWS AI Labs, and Tencent.

    CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation

    This paper presents CylinderDepth, a self-supervised surround depth estimation method leveraging cylindrical spatial attention for multi-view consistency across camera rigs.

    About the Speaker

    Samer Abualhanud is a PhD student and research staff member at Leibniz University Hannover, Germany, supervised by Dr.-Ing. Max Mehltretter and Prof. Christian Heipke. Research focuses on multi-view consistency in 3D reconstruction.

    Your ViT is Secretly Also a Video Segmentation Model

    Existing online video segmentation models typically combine a per-frame segmentation module with complex, specialized tracking modules. This work shows that a plain Vision Transformer encoder with a lightweight temporal module can match that performance, resulting in VidEoMT — up to 5–10x faster, running at up to 160 FPS with a ViT-L encoder.

    About the Speaker

    Daan de Geus is an Assistant Professor in the Mobile Perception Systems Lab at TU/e. He received his PhD (cum laude) from TU/e in 2024, and his research focuses on machine learning for visual and multimodal scene understanding.

    • Photo of the user
    1 attendee from this group

Group links

Organizers

Super Organizer

Members

498
See all