
What weâre about
đ This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month weâll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Send me a DM on Linkedin
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
6
- Network event
âąOnlineNov 24 - Best of ICCV (Day 4)
Online130 attendees from 44 groupsWelcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this yearâs conference. Live streaming from the authors to you.
When and Where
Nov 24, 2025
9 AM Pacific
Online. Register for the Zoom!
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding âleftâ versus ârightâ when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world.
About the Speaker
Shijie Zhou is a final-year PhD candidate at UCLA, recipient of the 2026 Dissertation Year Award and the Graduate Deanâs Scholar Award. His research focuses on spatial intelligence, spanning 3D/4D scene reconstruction and generation, vision-language models, generative AI, and interactive agentic systems. His work has been recognized at top conferences including CVPR, ICCV, ECCV, ICLR, and NeurIPS, and has also led to practical impact through research internships at Google and Apple.
DuoLoRA: Cycle-consistent and Rank-disentangled Content-Style Personalization
We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters.
Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.
About the Speaker
Aniket Roy is currently a PhD student in the Computer Science at Johns Hopkins University. Prior to that, he did a Masterâs from Indian Institute of Technology Kharagpur. During his Masterâs program, he demonstrated strong research capabilities, publishing multiple papers in prestigious conferences and journals (including ICIP, CVPR Workshops, TCSVT, and IWDW). He was recognized with the Best Paper Award at IWDW 2016 and the Markose Thomas Memorial Award for the best research thesis at the Masterâs level. Aniket continued to pursue research as a PhD student under the guidance of renowned vision researcher Professor Rama Chellappa at Johns Hopkins University. There, he explored the domains of few-shot learning, multimodal learning, diffusion models, LLMs, LoRA merging through publications in leading venues such as NeurIPS, ICCV, TMLR, WACV and CVPR. He also gained valuable industrial experience through internships at esteemed organizations, including Amazon, Qualcomm, MERL, and SRI International. He was also awarded as an Amazon Fellow (2023-24) at JHU, and invited to attend ICCV'25 doctoral consortium.
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting
CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method.
About the Speaker
Alexey Kravets is a PhD student in AI at the University of Bath, with over five years of experience working as a Lead Data Scientist at Aviva. My current research primarily focuses on vision and language models, few-shot learning, machine unlearning and mechanistic interpretability. Before my PhD, I've led significant machine learning projects in Aviva â a FTSE 100 insurer in the UK â that included the development of NLP tools for insurance predictions. My passion for AI extends into writing, where I regularly share insights through articles on Medium.
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3)
Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on SO(3) using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.
About the Speaker
Lennart Bastian is a PhD candidate at TU Munich's CAMP lab under Prof. Nassir Navab, and an incoming research fellow at Imperial College London. Originally trained in applied mathematics (with early stints in NYC and California's tech scene), he found his calling at the intersection of geometry, machine learning, and clinical applications. His work focuses on making sense of the real world in 3D, teaching computers to understand geometry and what happens in complex surgical environments.
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields
Neural Radiance Field (NeRF)-based segmentation methods focus on object semantics and rely solely on RGB data, lacking intrinsic material properties. This limitation restricts accurate material perception, which is crucial for robotics, augmented reality, simulation, and other applications. We introduce UnMix-NeRF, a framework that integrates spectral unmixing into NeRF, enabling joint hyperspectral novel view synthesis and unsupervised material segmentation. Our method models spectral reflectance via diffuse and specular components, where a learned dictionary of global endmembers represents pure material signatures, and per-point abundances capture their distribution. For material segmentation, we use spectral signature predictions along learned endmembers, allowing unsupervised material clustering. Additionally, UnMix-NeRF enables scene editing by modifying learned endmember dictionaries for flexible material-based appearance manipulation. Extensive experiments validate our approach, demonstrating superior spectral reconstruction and material segmentation to existing methods.
About the Speaker
Fabian Perez is a computer science student at Universidad Industrial de Santander (UIS) in Colombia. I am currently a master student. I have strong skills in software development and deep learning. My expertise across both these areas allows me to create innovative solutions by bringing them together.1 attendee from this group - Network event
âąOnlineDec 4 - AI, ML and Computer Vision Meetup
Online247 attendees from 47 groupsJoin the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Register for the Zoom
Date and Time
Dec 4, 2025
9:00 - 11:00 AM Pacific
Benchmarking Vision-Language Models for Autonomous Driving Safety
This workshop introduces a unified framework for evaluating how vision-language models handle driving safety. Using an enhanced BDDOIA dataset with scene, weather, and action labels, we benchmark models like Gemini, FastVLM, and Qwen within FiftyOne. Our results show consistent blind spots where models misjudge unsafe situations, highlighting the need for safer and more interpretable AI systems for autonomous driving.
About the Speaker
Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV. I started as a software developer, moved into AI, led teams, and served as CTO. Today, I connect code and community to build open, production-ready AI â making technology simple, accessible, and reliable.
TrueRice: AI-Powered Visual Quality Control for Rice Grains and Beyond at Scale
Agriculture remains one of the most under-digitized industries, yet grain quality control defines pricing, trust, and livelihoods for millions. TrueRice is an AI-powered analyzer that turns a flatbed scanner into a high-precision, 30-second QC engine, replacing the 2+ hours and subjectivity of manual quality inspection.
Built on a state-of-the-art 8K image processing pipeline with SAHI (Slicing Aided Hyper Inference), it detects fine-grained kernel defects at scale with high accuracy across grain size, shape, breakage, discoloration, and chalkiness. Now being extended to maize and coffee, TrueRice showcases how cross-crop transfer learning and frugal AI engineering can scale precision QC for farmers, millers, and exporters. This talk will cover the design principles, model architecture choices, and a live demonstration, while addressing challenges in data variability, regulatory standards, and cross-crop adaptation.
About the Speaker
Sai Jeevan Puchakayala is an Interdisciplinary AI/ML Consultant, Researcher, and Tech Lead at Sustainable Living Lab (SL2) India, where he drives development of applied AI solutions for agriculture, climate resilience, and sustainability. He led the engineering of TrueRice, an award-winning grain quality analyzer that won Indiaâs first International Agri Hackathon 2025.
WeedNet: A Foundation Model Based Global-to-Local AI Approach for Real-Time Weed Species Identification and Classification
Early and accurate weed identification is critical for effective management, yet current AI-based approaches face challenges due to limited expert-verified datasets and the high variability in weed morphology across species and growth stages. We present WeedNet, a global-scale weed identification model designed to recognize a wide range of species, including noxious and invasive plants. WeedNet is an end-to-end real-time pipeline that integrates self-supervised pretraining, fine-tuning, and trustworthiness strategies to improve both accuracy and reliability.
Building on this foundation, we introduce a Global-to-Local strategy: while the Global WeedNet model provides broad generalization, we fine-tune local variants such as Iowa WeedNet to target region-specific weed communities in the U.S. Midwest. Our evaluation addresses both intra-species diversity (different growth stages) and inter-species similarity (look-alike species), ensuring robust performance under real-world variability. We further validate WeedNet on images captured by drones and ground rovers, demonstrating its potential for deployment in robotic platforms. Beyond field applications, we integrate a conversational AI to enable practical decision-support tools for farmers, agronomists, researchers, and land managers worldwide. These advances position WeedNet as a foundational model for intelligent, scalable, and regionally adaptable weed management and ecological conservation.
About the Speaker
Timilehin Ayanlade is a Ph.D. candidate in the Self-aware Complex Systems Laboratory at Iowa State University, where his research focuses on developing machine learning and computer vision methods for agricultural applications. His work integrates multimodal data across ground-based sensing, UAV, and satellite with advanced AI models to tackle challenges in weed identification, crop monitoring, and crop yield prediction.
Memory Matters: Early Alzheimerâs Detection with AI-Powered Mobile Tools
Advancements in artificial intelligence and mobile technology are transforming the landscape of neurodegenerative disease detection, offering new hope for early intervention in Alzheimerâs.
By integrating machine learning algorithms with everyday mobile devices, we are entering a new era of accessible, scalable, and non-invasive tools for early Alzheimerâs detection
In this talk, weâll cover the potential of AI in health care systems, ethical considerations, plus an architecture, model, datasets and framework deep dive.
About the Speaker
Reetam Biswas has more than 18 years of experience in the IT industry as a software architect, currently working on AI.3 attendees from this group - Network event
âąOnlineDec 11 - Visual AI for Physical AI Use Cases
Online158 attendees from 47 groupsJoin our virtual meetup to hear talks from experts on cutting-edge topics across Visual AI for Physical AI use cases.
Date, Time and Location
Dec 11, 2025
9:00-11:00 AM Pacific
Online. Register for the Zoom!
From Data to Open-World Autonomous Driving
Data is key for advances in machine learning, including mobile applications like robots and autonomous cars. To ensure reliable operation, occurring scenarios must be reflected by the underlying dataset. Since the open-world environments can contain unknown scenarios and novel objects, active learning from online data collection and handling of unknowns is required. In this talk we discuss different approach to address this real world requirements.
About the Speaker
Sebastian Schmidt is a PhD student at the Data Analytics and Machine Learning group at TU Munich and part of an Industrial PhD Program with the BMW research group. His work is mainly focused on Open-world active learning and perception for autonomous vehicles.
From Raw Sensor Data to Reliable Datasets: Physical AI in Practice
Modern mobility systems rely on massive, high-quality multimodal datasets â yet real-world data is messy. Misaligned sensors, inconsistent metadata, and uneven scenario coverage can slow development and lead to costly model failures. The Physical AI Workbench, built in collaboration between Voxel51 and NVIDIA, provides an automated and scalable pipeline for auditing, reconstructing, and enriching autonomous driving datasets.
In this talk, weâll show how FiftyOne serves as the central interface for inspecting and validating sensor alignment, scene structure, and scenario diversity, while NVIDIA Neural Reconstruction (NuRec) enables physics-aware reconstruction directly from real-world captures. Weâll highlight how these capabilities support automated dataset quality checks, reduce manual review overhead, and streamline the creation of richer datasets for model training and evaluation.
Attendees will gain insight into how Physical AI workflows help mobility teams scale, improve dataset reliability, and accelerate iteration from data capture to model deployment â without rewriting their infrastructure.
About the Speaker
Daniel Gural leads technical partnerships at Voxel51, where heâs building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems. With a background in developer relations and computer vision engineering,
Building Smarter AV Simulation with Neural Reconstruction and World Models
This talk explores how neural reconstruction and world models are coming together to create richer, more dynamic simulation for scalable autonomous vehicle development. Weâll look at the latest releases in 3D Gaussian splatting techniques and world reasoning and generation, as well as discuss how these technologies are advancing the deployment of autonomous driving stacks that can generalize to any environment. Weâll also cover NVIDIA open models, frameworks, and data to help kickstart your own development pipelines.
About the Speaker
Katie Washabaugh is NVIDIAâs Product Marketing Manager for Autonomous Vehicle Simulation, focusing on virtual solutions for real world mobility. A former journalist at publications such as Automotive News and MarketWatch, she joined the NVIDIA team in 2018 as Automotive Content Marketing Manager. Katie holds a B.A. in public policy from the University of Michigan and lives in Detroit.
Relevance of Classical Algorithms in Modern Autonomous Driving Architectures
While modern autonomous driving systems increasingly rely on machine learning and deep neural networks, classical algorithms continue to play a foundational role in ensuring reliability, interpretability, and real-time performance. Techniques such as Kalman filtering, A* path planning, PID control, and SLAM remain integral to perception, localization, and decision-making modules. Their deterministic nature and lower computational overhead make them especially valuable in safety-critical scenarios and resource-constrained environments. This talk explores the enduring relevance of classical algorithms, their integration with learning-based methods, and their evolving scope in the context of next-generation autonomous vehicle architectures.
Prajwal Chinthoju is an Autonomous Driving Feature Development Engineer with a strong foundation in systems engineering, optimization, and intelligent mobility. I specialize in integrating classical algorithms with modern AI techniques to enhance perception, planning, and control in autonomous vehicle platforms.3 attendees from this group - Network event
âąOnlineDec 16 - Building and Auditing Physical AI Pipelines with FiftyOne
Online97 attendees from 47 groupsThis hands-on workshop introduces you to the Physical AI Workbench, a new layer of FiftyOne designed for autonomous vehicle, robotics, and 3D vision workflows. Youâll learn how to bridge the gap between raw sensor data and production-quality datasets, all from within FiftyOneâs interactive interface.
Date, Time and Location
Dec 16, 2025
9:00-10:00 AM Pacific
Online. Register for the Zoom!
Through live demos, youâll explore how to:
- Audit: Automatically detect calibration errors, timestamp misalignments, incomplete frames, and other integrity issues that arise from dataset format drift over time.
- Generate: Reconstruct and augment your data using NVIDIA pathways such as NuRec, COSMOS, and Omniverse, enabling realistic scene synthesis and physical consistency checks.
- Enrich: Integrate auto-labeling, embeddings, and quality scoring pipelines to enhance metadata and accelerate model training.
- Export and Loop Back: Seamlessly export to and re-import from interoperable formats like NCore to verify consistency and ensure round-trip fidelity.
Youâll gain hands-on experience with a complete physical AI dataset lifecycleâfrom ingesting real-world AV datasets like nuScenes and Waymo, to running 3D audits, projecting LiDAR into image space, and visualizing results in FiftyOneâs UI. Along the way, youâll see how Physical AI Workbench automatically surfaces issues in calibration, projection, and metadataâhelping teams prevent silent data drift and ensure reliable dataset evolution.
By the end, youâll understand how the Physical AI Workbench standardizes the process of building calibrated, complete, and simulation-ready datasets for the physical world.
Who should attend
Data scientists, AV/ADAS engineers, robotics researchers, and computer vision practitioners looking to standardize and scale physical-world datasets for model development and simulation.
About the Speaker
Daniel Gural leads technical partnerships at Voxel51, where heâs building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.
Past events
195

