
About us
đ This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month weâll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
10
- Network event

May 6 - Building Composable Computer Vision Workflows in FiftyOne
·OnlineOnline82 attendees from 48 groupsThis workshop explores the FiftyOne plugin framework to build custom computer vision applications. Youâll learn to extend the open source FiftyOne App with Python based panels and server side operators, as well as integrate external tools for labeling, vector search, and model inference into your dataset views.
Date, Time and Location
May 6, 2026
10 AM - 11 AM PST
Online. Register for the Zoom!What You'll Learn
- Build Python plugins. Define plugin manifests and directory structures to register custom functionality within the FiftyOne ecosystem.
- Develop server side operators. Write functions to execute model inference, data cleaning, or metadata updates from the App interface.
- Build interactive panels. Create custom UI dashboards using to visualize model metrics or specialized dataset distributions.
- Manage operator execution contexts. Pass data between the App front end and your backend to build dynamic user workflows.
- Implement delegated execution. Configure background workers to handle long running data processing tasks without blocking the user interface.
- Build labeling integrations. Streamline the flow of data between FiftyOne and annotation platforms through custom triggers and ingestion scripts.
- Extend vector database support. Program custom connectors for external vector stores to enable semantic search across large sample datasets.
- Package and share plugins. Distribute your extensions internally and externally
7 attendees from this group - Network event

May 7 - Visual AI in Healthcare
·OnlineOnline258 attendees from 48 groupsJoin us to hear experts on cutting-edge topics at the intersection of AI, ML, computer vision and healthcare.
Date, Time, and Location
May 07, 2026
9AM PST
Online. Register for the Zoom!Representation Learning Under Weak Supervision in Computational Pathology
Computational pathology has advanced rapidly with deep learning and, more recently, pathology foundation models that provide strong transferable representations from whole-slide images. Yet important gaps remain: pretrained features often retain domain shift relative to downstream clinical datasets, and most existing pipelines do not explicitly model the geometric organization of tissue architecture that underlies disease progression.
In this talk, I will present our work on weak- and semi-supervised representation learning methods designed to address these challenges, including adaptive stain separation for contrastive learning, bag-label-aware contrastive pretraining for multiple-instance learning, and distance-aware spatial modeling that injects tissue geometry into slide-level prediction. These methods reduce dependence on dense annotations while improving the quality, robustness, and clinical relevance of learned representations in histopathology. Across kidney and prostate cancer studies, they produce stronger downstream performance than standard self-supervised, semi-supervised, and MIL baselines, including improved classification on ccRCC datasets and more accurate prediction of metastatic risk from diagnostic prostate biopsies.
About the Speaker
Dr. Tolga Tasdizen is Professor and Associate Chair of Electrical and Computer Engineering and a faculty member of the Scientific Computing and Imaging Institute at the University of Utah, where he works on AI and machine learning for image analysis with applications in biomedical imaging, public health, and materials science. His research spans self- and semi-supervised learning, domain adaptation, and interpretability.
Efficient and Reliable AI for Real-World Healthcare Deployment
Healthcare is one of the highest-impact domains for AI, yet reliable deployment at scale remains difficult. To truly improve patient care and clinical workflows, AI must operate under real clinical constraints, not just in ideal lab settings. In practice, deployment is limited by high compute and memory costs, scarce labeled data, and distribution shifts across sites and time. Many clinically important findings are also rare and long-tailed, which makes generalization especially challenging. My research makes deployability a design objective by developing methods that stay accurate under strict resource and data constraints.
In this talk, I will first discuss high-performance lightweight deep learning architectures built by redesigning core building blocks. I will then present training-time generative supervision strategies that improve data efficiency and generalization to rare and long-tailed cases with no inference overhead. I will conclude with a forward-looking direction toward real-time perception for surgical assistance, where reliable performance under strict constraints is non-negotiable.
About the Speaker
Md Mostafijur Rahman is a Ph.D. candidate at The University of Texas at Austin, advised by Radu Marculescu. His research sits at the intersection of AI, biomedical imaging, and computer vision, with a focus on building efficient, reliable, and scalable AI systems for deployment in healthcare under real-world constraints. His work has been translated to practice through research internships at GE Healthcare, the National Institutes of Health (NIH), and Bosch Research.
VIGIL: Vectors of Intelligent Guidance in Long-Reach Rural Healthcare
VIGIL (Vectors of Intelligent Guidance in Long-Reach Rural Healthcare) is an AI-driven system designed to support generalist clinicians through interactive, multimodal guidance. The system combines perception, language understanding, and tool use to assist with tasks such as ultrasound acquisition and interpretation in real time. In this talk, we focus on the overall system architecture, highlighting how individual componentsâranging from visual models to medical reasoning agentsâinteract to produce coherent guidance. We also discuss key challenges we have encountered, including tool orchestration, latency, and robustness across components. This presentation aims to provide a systems-level perspective on building embodied AI agents for real-world healthcare settings.
About the Speaker
Andrew Krikorian is a Ph.D. student in Robotics at the University of Michigan, where he is a member of the Corso Group (COG). His research focuses on building physically grounded AI agents that combine perception, tool use, and planning to operate effectively in real-world environments, with a particular emphasis on healthcare applications. He is actively involved in the ARPA-H PARADIGM program, developing intelligent systems for rural clinical settings.
Scaling Healthcare AI with Synthetic Data and World Models
The scarcity of labeled, privacy-compliant medical imaging data remains one of the biggest bottlenecks in healthcare AI development. Emerging world models are changing this landscape by generating high-fidelity synthetic data â from radiology scans to surgical scene simulations â that can augment real-world datasets without compromising patient privacy. However, synthetic data is only as valuable as your ability to curate, validate, and evaluate it alongside real clinical data. In this talk, we explore how teams are using FiftyOne to build rigorous quality pipelines around synthetic medical imagery, enabling them to detect distribution gaps, measure model performance across rare pathologies, and ensure that generated samples meaningfully improve downstream diagnostics. We'll walk through practical workflows that combine world model outputs with real-world medical datasets to accelerate Visual AI in healthcare â responsibly and at scale.
About the Speaker
Daniel Gural is an expert in Physical AI and has been working in the field for over 8 years. Working across healthcare he has experience in both operating use case as well as using Visual AI as an aid in psychology applications as well.
21 attendees from this group - Network event

May 11 - Best of 3DV 2026
·OnlineOnline124 attendees from 48 groupsWelcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this yearâs conference. Live streaming from the authors to you.
Date, Time and Location
May 11, 2026
9AM Pacific
Online. Register for Zoom!Navigating a 3D Vision Conference with VLMs and Embeddings
Attending the 3D Vision Conference means confronting 177 accepted papers across 3.5 days, far more than any one person can absorb. Skimming titles the night before isn't enough.
This talk shows how to build a systematic, interactive map of an entire conference using modern open-source tools. We load all 177 papers from 3DV 2026 (full PDF page images plus metadata) into a FiftyOne grouped dataset. We then run three annotation passes using Qwen3.5-9B on each cover page: topic classification, author affiliation extraction, and project page detection. Document embeddings from Jina v4 are computed across all 3,019 page images, pooled to paper-level vectors, and fed into FiftyOne Brain for UMAP visualization, similarity search, representativeness scoring, and uniqueness scoring.
The result is an interactive dataset you can query, filter, and explore in the FiftyOne App. Sort by uniqueness to find distinctive work, filter by topic and sort by representativeness to understand each research area, and cross-reference with scheduling metadata to build a personal agenda.
I demonstrate the end-to-end pipeline and discuss design decisions regarding grouped datasets, reasoning model output parsing, and embedding pooling strategies.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heâs got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.
Seeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object Removal
We present SeeingThroughClutter, a method for reconstructing structured 3D representations from single images by segmenting and modeling objects individually. Prior approaches rely on intermediate tasks such as semantic segmentation and depth estimation, which often underperform in complex scenes, particularly in the presence of occlusion and clutter.
We address this by introducing an iterative object removal and reconstruction pipeline that decomposes complex scenes into a sequence of simpler subtasks. Using VLMs as orchestrators, foreground objects are removed one at a time via detection, segmentation, object removal, and 3D fitting. We show that removing objects allows for cleaner segmentations of subsequent objects, even in highly occluded scenes. Our method requires no task-specific training and benefits directly from ongoing advances in foundation models. We demonstrate state-of-the-art robustness on 3D-Front and ADE20K datasets.
About the Speaker
Rio Aguina-Kang is currently a Machine Learning Engineer at Drafted AI, a startup focused on generative architecture. He has previously worked at Adobe Research, Brown Visual Computing, and the Stanford Institute for Human-Centered Artificial Intelligence. He is broadly interested in building systems that let users generate and control visual content through structured representations that reflect their intent.
Physical Realistic 4D Generation
Generating dynamic 3D content that moves and deforms over time is a key frontier in visual computing, with applications in VR/AR, robotics, and digital humans. In this talk, I present our series of works on physically realistic 4D generation: from neural surface deformation with explicit velocity fields (ICLR 2025) to our 4Deform framework for robust shape interpolation (CVPR 2025). Both methods use implicit neural representations with physically constrained velocity fields that enforce volume preservation, spatial smoothness, and geometric consistency. I will also introduce TwoSquared (3DV 2026, oral), which achieves full 4D generation from just two 2D image pairs â demonstrating a practical path toward controllable, physically plausible 4D content creation.
About the Speaker
â Lu Sang is a PhD researcher in Computer Vision at TU Munich (Prof. Daniel Cremers), specializing in 3D/4D reconstruction, neural implicit surfaces, and inverse rendering, with serveral publications at top venues including CVPR, ICLR, and ECCV. She is currently a research intern at Google XR in Zurich. With a strong mathematical foundation and a track record spanning photometric stereo to 4D generation, she brings both theoretical depth and hands-on engineering to cutting-edge visual computing research.
Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception
How can we use and perceive objects without training a new model given only a few images? We present NeMO, a novel object representation that allows 6DoF object pose estimation, detection and segmentation given only a hand full of RGB images of an unknown object.
About the Speaker
Sebastian Jung studied physics at the LMU munich. He started his PhD in Computer Science at the german aerospace center (DLR) in 2025 and focuses on object centric few shot perception with a focus on robotic applications. Additionally, he's a student researcher at google, focusing on computer vision algorithms for XR.
6 attendees from this group - Network event

May 12 - Best of 3DV 2026
·OnlineOnline53 attendees from 48 groupsWelcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this yearâs conference. Live streaming from the authors to you.
Date, Time and Location
May 12, 2026
9AM Pacific
Online. Register for Zoom!Precise lighting control in diffusion models by drawing shadows
Diffusion models can now be used as powerful neural rendering engines that can be leveraged for realistically inserting virtual objects into images. However, unlike traditional 3D rendering engines (e.g., Blender), they lack precise control over the lighting, an essential requirement in an artistic workflow. We demonstrate that fine-grained lighting control can be achieved for object relighting simply by specifying the desired shadow of the object and injecting it into the diffusion denoising process. The model then produces a realistic relighting of the object consistent with the input shadow direction.
About the Speaker
Frédéric Fortier-Chouinard is a PhD student at Laval University, advised by Prof. Jean-François Lalonde. His research focuses on adding precise physical controls to diffusion-based image and video generation methods, in particular lighting and camera control.
SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction
Smoke in real-world scenes can severely degrade image quality and hamper visibility. Recent image restoration methods either rely on data-driven priors that are susceptible to hallucinations, or are limited to static low-density smoke. We introduce SmokeSeer, a method for simultaneous 3D scene reconstruction and smoke removal from multi-view video sequences. Our method uses thermal and RGB images, leveraging the reduced scattering in thermal images to see through smoke. We build upon 3D Gaussian splatting to fuse information from the two image modalities, and decompose the scene into smoke and non-smoke components. Unlike prior work, SmokeSeer handles a broad range of smoke densities and adapts to temporally varying smoke. We validate our method on synthetic data and a new real-world smoke dataset with RGB and thermal images.
About the Speaker
Neham Jain is a Research Scientist at Meshy AI focused on 3D generative models and multimodal learning. He holds an M.S. in Robotics from Carnegie Mellon University and works at the intersection of 3D vision, neural rendering, and scalable AI systems.
Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption
Depth estimation from monocular video has become a key component of many real-world computer vision systems. Recently, Video Depth Anything (VDA) has demonstrated strong performance on long video sequences. However, it relies on batch-processing which prohibits its use in an online setting. In this work, we overcome this limitation and introduce online VDA (oVDA). The key innovation is to employ techniques from Large Language Models (LLMs), namely, caching latent features during inference and masking frames at training. Our oVDA method outperforms all competing online video depth estimation methods in both accuracy and VRAM usage. Low VRAM usage is particularly important for deployment on edge devices. We demonstrate that oVDA runs at 42 FPS on an NVIDIA A100 and at 20 FPS on an NVIDIA Jetson edge device.
About the Speaker
Johann-Friedrich Feiden is a PhD student at UniversitÀt Heidelberg specializing in computer vision and machine learning. During my bachelor's, I focused on self-supervised representations, while during my master's, I shifted my focus towards computer vision.
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos
Novel view synthesis of dynamic scenes from monocular video tends to break down once the camera deviates far from the training trajectory, leaving applications in mixed reality, autonomous driving, and immersive media without reliable wide-angle renderings. We present ExpanDyNeRF, a dynamic NeRF that broadens the reliable synthesis range to large-angle rotations by leveraging Gaussian splatting priors as pseudo ground truth to jointly refine density and color at novel viewpoints. To benchmark side-view fidelity, an axis largely missing from prior datasets, we introduce SynDM, the first synthetic dynamic multi-view dataset with paired primary and rotated views, built on a custom GTA V pipeline. Across SynDM, DyNeRF, and NVIDIA, ExpanDyNeRF substantially outperforms prior dynamic NeRF and Gaussian methods under extreme viewpoint shifts.
We close by previewing PanoWorld, our follow-up that pushes view expansion to its natural limit, namely geometry-consistent 360° panoramic video generation from a single image and text prompt.
About the Speaker
Le Jiang is a [Ph.D. student] in the Augmented Cognition Lab (ACLab) at Northeastern University, advised by Prof. Sarah Ostadabbas. His research centers on 3D scene reconstruction and novel view synthesis for dynamic scenes, with recent work extending dynamic NeRFs to large-angle viewpoints and pushing view synthesis toward geometry-consistent 360° panoramic video world models.
7 attendees from this group
Past events
225

