
What weâre about
đ This group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month weâll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
9
- Network event
â˘OnlineNov 12 - Visual AI in Agriculture: Scaling Precision Field Management
Online0 attendees from 34 groupsJoin us as we reveal how a data-centric visual AI workflow turns drone imagery, satellite data, and ground-truth sensors into a comprehensive field intelligence system
Date, Time and Location
Nov 12, 2025
10 AM Pacific
Online. Register for the Zoom!In this webinar you'll learn how multi-spectral visual AI transforms field management from reactive to predictive, enabling precision agriculture at scale. Through live demos and real-world case studies, you'll see how fusing RGB, multispectral, and IoT sensor data accelerates weed detection, optimizes variable rate applications, and maximizes yield potential.
Topic covered:
- Building a field-scale dataset: ingesting drone imagery, satellite data, and IoT sensor streams into a unified visual dataset for comprehensive field analysis
- Curating for accuracy: annotation, validation, and quality control techniques that capture field variability and edge cases traditional scouting techniques miss
- Training & evaluating models: fuse visual, spectral, and environmental features into one pipeline for weed detection, crop health assessment, and zone delineation
- Measuring field impact: align model performance with key metrics like input efficiency, yield variance, and ROI per zone
- Network event
â˘OnlineNov 13 - Women in AI
Online278 attendees from 47 groupsHear talks from experts on the latest topics in AI, ML, and computer vision on November 13.
Date and Location
Nov 13, 2025
9 AM Pacific
Online. Register for the Zoom!Copy, Paste, Customize! The Template Approach to AI Engineering
Most AI implementations fail because teams treat prompt engineering as ad-hoc experimentation rather than systematic software engineering, leading to unreliable systems that don't scale beyond proof-of-concepts. This talk demonstrates engineering practices that enable reliable AI deployment through standardized prompt templates, systematic validation frameworks, and production observability.
Drawing from experience developing fillable prompt templates currently being validated in production environments processing thousands of submissions, I'll share how Infrastructure as Code principles apply to LLM workflows, why evaluation metrics like BLEU scores are critical for production reliability, and how systematic failure analysis prevents costly deployment issues. Attendees will walk away with understanding of practical frameworks for improving AI system reliability and specific strategies for building more consistent, scalable AI implementations.
About the Speaker
Jeanne McClure is a postdoctoral scholar at NC State's Data Science and AI Academy with expertise in systematic AI implementation and validation. Her research transforms experimental AI tools into reliable production systems through standardized prompt templates, rigorous testing frameworks, and systematic failure analysis. She holds a PhD in Learning, Design and Technology with additional graduate work in data science.
Multimodality with Biases: Understand and Evaluate VLMs for Autonomous Driving with FiftyOne
Do your VLMs really see danger? With FiftyOne, Iâll show you how to understand and evaluate vision-language models for autonomous driving â making risk and bias visible in seconds. Weâll compare models on the same scenes, reveal failures and edge cases, and youâll see a simple dashboard to decide which data to curate and what to adjust. Youâll leave with a clear, practical, and replicable method to raise the bar for safety.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.
The Heart of Innovation: Women, AI, and the Future of Healthcare
This session explores how Artificial Intelligence is transforming healthcare by enhancing diagnosis, treatment, and patient outcomes. It highlights the importance of diverse and female perspectives in shaping AI solutions that are ethical, empathetic, and human-centered. We will discuss key applications, current challenges, and the future potential of AI in medicine. Itâs a forward-looking conversation about how innovation can build a healthier world.
About the Speaker
Karen Sanchez is a Postdoctoral Researcher at the Center of Excellence for Generative AI at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. Her research focuses on AI for Science, spanning computer vision, video understanding, and privacy-preserving machine learning. She is also an active advocate for diversity and outreach in AI, contributing to global initiatives that connect researchers and amplify underrepresented voices in technology.
Language Diffusion Models
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). Challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens.
Optimizing a likelihood bound provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue.
About the Speaker
Jayita Bhattacharyya is an AI/ML Nerd with a blend of technical speaking & hackathon wizardry! Applying tech to solve real-world problems. The work focus these days is on generative AI. Helping software teams incorporate AI into transforming software engineering.
3 attendees from this group - Network event
â˘OnlineNov 14 - Workshop: Document Visual AI with FiftyOne
Online116 attendees from 47 groupsThis hands-on workshop introduces you to document visual AI workflows using FiftyOne, the leading open-source toolkit for computer vision datasets.
Date and Location
Nov 14, 2025
9:00-10:30 AM Pacific
Online. Register for the ZoomIn document understanding, a pixel is worth a thousand tokens. While traditional text-extraction pipelines tokenize and process documents sequentially, modern visual AI approaches can understand document structure, layout, and content directly from imagesâmaking them more efficient, accurate, and robust to diverse document formats.
In this workshop you'll learn how to:
- Load and organize document datasets in FiftyOne for visual exploration and analysis
- Compute visual embeddings using state-of-the-art document retrieval models to enable semantic search and similarity analysis
- Leverage FiftyOne workflows including similarity search, clustering, and quality assessment to gain insights from your document collections
- Deploy modern vision-language models for OCR and document understanding tasks that go beyond simple text extraction
- Evaluate and compare different OCR models to select the best approach for your specific use case
Whether you're working with invoices, receipts, forms, scientific papers, or mixed document types, this workshop will equip you with practical skills to build robust document AI pipelines that harness the power of visual understanding. Walk away with reproducible notebooks and best practices for tackling real-world document intelligence challenges.
3 attendees from this group - Network event
â˘OnlineNov 19 - Best of ICCV (Day 1)
Online121 attendees from 44 groupsWelcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this yearâs conference. Live streaming from the authors to you.
Date, Time and Location
Nov 19, 2025
9 AM Pacific
Online. Register for the Zoom!AnimalClue: Recognizing Animals by their Traces
Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring.
To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces.
About the Speaker
Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models.
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation.
First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion modelâs multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization.
About the Speaker
Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain.
ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment
Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions.
Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains.
About the Speaker
Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems.
CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation
We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corporaâespecially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation.
About the Speaker
Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech.
2 attendees from this group
Past events
52

