
About us
đź–– This group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
11
- Network event

April 8 - Getting Started with FiftyOne
·OnlineOnline65 attendees from 48 groupsThis workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.
Date, Time and Location
Apr 8, 2026
10 AM PST - 11 AM Pacific
Online. Register for the Zoom!The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.
What you'll learn
- Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
- Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
- Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
- Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
- Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
- Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.
Prerequisites:
- Working knowledge of Python and machine learning and/or computer vision fundamentals.
- All attendees will get access to the tutorials and code examples used in the workshop.
2 attendees from this group - Network event

April 9 - Workshop: Build a Visual Agent that can Navigate GUIs like Humans
·OnlineOnline373 attendees from 48 groupsThis hands-on workshop provides a comprehensive introduction to building and evaluating visual agents for GUI automation using modern tools and techniques.
Date, Time and Location
April 9, 2026 at 9 AM Pacific
Online. Register for the ZoomVisual agents that can understand and interact with graphical user interfaces represent a transformative frontier in AI automation. These systems combine computer vision, natural language understanding, and spatial reasoning to enable machines to navigate complex interfaces—from web applications to desktop software—just as humans do. However, building robust GUI agents requires careful attention to dataset curation, model evaluation, and iterative improvement workflows.
Participants will learn how to leverage FiftyOne, an open-source toolkit for dataset curation and computer vision workflows, to build production-ready GUI agent systems.
What You'll Learn:
- Dataset Creation & Management: How to structure, annotate, and load GUI interaction datasets using the COCO4GUI standardized format
- Data Exploration & Analysis: Using FiftyOne's interactive interface to visualize datasets, analyze action distributions, and understand annotation patterns
- Multimodal Embeddings: Computing embeddings for screenshots and UI element patches to enable similarity search and retrieval
- Model Inference: Running state-of-the-art models like Microsoft's GUI-Actor to predict interaction points from natural language instructions
- Performance Evaluation: Measuring model accuracy using standard metrics and normalized click distance to assess localization precision
- Failure Analysis: Investigating model failures through attention maps, error pattern analysis, and systematic debugging workflows
- Data-Driven Improvement: Tagging samples based on error types (attention misalignment vs. localization errors) to prioritize fine-tuning efforts
- Synthetic Data Generation: Using FiftyOne plugins to augment training data with synthetic task descriptions and variations
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
2 attendees from this group - Network event

April 16 - AI, ML and Computer Vision Meetup en Español
·OnlineOnline13 attendees from 5 groupsJoin our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision en Español .
Date, Time and Location
Apr 16, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!Uncertainty in Large Vision-Language Models and Computer Vision
What if we train a model to classify dogs and cats, but it is later tested with an image of a human? Generally the model will output either dog or cat, and has no ability to signal that the image has no class that it can recognize.
Machine learning models by default do not provide estimates of their confidence or uncertainty, which hinders their use in applications involving humans. Possible solutions is the use of Bayesian Neural Networks or similar models.
In this talk I will show research applications of neural networks with uncertainty quantification, covering Computer Vision, Large Language Models and Vision-Language Models. This includes super-resolution, frame generation, verbalized uncertainty, robustness to corrupted inputs, and input uncertainty.
About the Speaker
Dr. Matias Valdenegro is Tenured Assistant Professor of Machine Learning at the Department of Artificial Intelligence, Bernoulli Institute, University of Groningen since 2022. He studied Computer Science, Autonomous Systems, and Electrical Engineering in Chile, Germany, and Scotland, holding a PhD from Heriot-Watt University on a thesis in detecting marine debris in sonar images. As a Researcher at the German Research Center for Artificial Intelligence in Bremen he conducted research in Computer Vision and Uncertainty Quantification from 2018 to 2022.
Deep Generative Modeling for Multimodal Human Trajectory Prediction
In this talk, I plan to show how deep generative models can be used as powerful multiple-hypothesis predictive models, in human trajectory prediction. This kind of problem arises in particular in autonomous driving. I will show a few works we have done in the past and a few ongoing works in my team.
About the Speaker
Jean-Bernard Hayet studied my engineering degree at Ecole Nationale Supérieure de Techniques Avancées (ENSTA) in Paris, and obtained my master degree in artificial intelligence at University Paris VI. I got my Ph.D. degree from University of Toulouse in 2003, at LAAS-CNRS, in Toulouse.
Cuando el conocimiento es Open la InnovaciĂłn se acelera
En esta charla mostraremos cĂłmo, cuando el conocimiento es abierto, la innovaciĂłn se acelera al volverse accesible para cualquier colaborador y no solo para unos pocos expertos. Presentare Promptotyper, una plataforma creada por Innovaitors que integra modelos open source y librerĂas como LangChain y LangGraph para habilitar soluciones agĂ©nticas que guĂan desde el planteamiento del problema hasta el prototipado.
A travĂ©s de agentes expertos en innovaciĂłn, los equipos pueden estructurar retos empresariales y avanzar hacia soluciones en áreas como automatizaciĂłn (por ejemplo con n8n), prototipado de aplicaciones web y analĂtica de datos. El enfoque democratiza el “saber hacer” innovaciĂłn en empresas de LatinoamĂ©rica, reduciendo la fricciĂłn y aumentando la velocidad de aprendizaje y ejecuciĂłn. Al final, verás cĂłmo convertir el expertise en un sistema reutilizable que escala capacidades de innovaciĂłn en toda la organizaciĂłn.
About the Speaker
Alejandro Uribe es cientĂfico de datos, cofundador de Innovaitors y consultor en industria 4.0. MagĂster en Inteligencia Artificial (U. Javeriana), profesor en la U. Externado e investigador en IA en la Javeriana, con 6 años desarrollando soluciones de IA y analĂtica de datos.
From Using Open Source to Contributing: A Practical Guide to Getting Started
Open source is one of the best ways to learn faster, build real experience, and grow your career, but many people don’t know how to start. In this talk, I share a very practical approach to contributing to open source, based on real experience. We’ll cover how to choose the right project, understand large codebases, start with small contributions, and communicate clearly with maintainers. Using FiftyOne as a real example (but keeping everything general), I’ll show how small actions like fixing docs, improving tooling, or opening a simple PR can lead to long-term impact, visibility, and growth.
About the Speaker
Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV. I started as a software developer, moved into AI, led teams, and served as CTO.
3 attendees from this group - Network event

April 23 - Advances in AI at Johns Hopkins University
·OnlineOnline129 attendees from 48 groupsJoin our virtual Meetup to hear talks from researchers at Johns Hopkins University on cutting-edge AI topics.
Date, Time and Location
Apr 23, 2026
9AM PST
Online. Register for the Zoom!Recent Advancements in Image Generation and Understanding
In this talk, I will provide an overview of my research and then take a closer look at three recent works. Image generation has progressed rapidly in the past decade-evolving from Gaussian Mixture Models (GMMs) to Variational Autoencoders (VAEs), GANs, and more recently diffusion models, which have set new standards for quality. I will begin with DiffNat (TMLR’25), which draws inspiration from a simple yet powerful observation: the kurtosis concentration property of natural images. By incorporating a kurtosis concentration loss together with a perceptual guidance strategy, DiffNat can be plugged directly into existing diffusion pipelines, leading to sharper and more faithful generations across tasks such as personalization, super-resolution, and unconditional synthesis.
Continuing the theme of improving quality under constraints, I will then discuss DuoLoRA (ICCV’25), which tackles the challenge of content–style personalization from just a few examples. DuoLoRA introduces adaptive-rank LoRA merging with cycle-consistency, allowing the model to better disentangle style from content. This not only improves personalization quality but also achieves it with 19× fewer trainable parameters, making it far more efficient than conventional merging strategies.
Finally, I will turn to Cap2Aug (WACV’25), which directly addresses data scarcity. This approach uses captions as a bridge for semantic augmentation, applying cross-modal backtranslation (image → text → image) to generate diverse synthetic samples. By aligning real and synthetic distributions, Cap2Aug boosts both few-shot and long-tail classification performance on multiple benchmarks.
About the Speaker
Aniket Roy is currently a Research Scientist at NEC Labs America. He recently earned a PhD from the Computer Science department at Johns Hopkins University under the guidance of Bloomberg Distinguished Professor Prof. Rama Chellappa.
From Representation Analysis to Data Refinement: Understanding Correlations in Deep Models
This talk examines how deep learning models encode information beyond their intended objectives and how such dependencies influence reliability, fairness, and generalization. Representation-level analysis using mutual information–based expressivity estimation is introduced to quantify the extent to which attributes such as demographics or anatomical structural factors are implicitly captured in learned embeddings, even when they are not explicitly used for supervision. These analyses reveal hierarchical patterns of attribute encoding and highlight how correlated factors emerge across layers. Data attribution techniques are then discussed to identify influential training samples that contribute to model errors and reinforce dependencies that reduce robustness. By auditing the training data through influence estimation, harmful instances can be identified and removed to improve model behavior. Together, these components highlight a unified, data-centric perspective for analyzing and refining correlations in deep models.
About the Speaker
Basudha Pal is a recent PhD graduate from the Electrical and Computer Engineering Department at Johns Hopkins University. Her research lies at the intersection of computer vision and representation learning, focusing on understanding and refining correlations in deep neural network representations for biometric and medical imaging using mutual information analysis, data attribution, and generative modeling to improve robustness, fairness, and reliability in high-stakes AI systems.
Scalable & Precise Histopathology: Next-Gen Deep Learning for Digital Histopathology
Whole slide images (WSIs) present a unique computational challenge in digital pathology, with single images reaching gigapixel resolution, equivalent to 500+ photos stitched together. This talk presents two complementary deep learning solutions for scalable and accurate WSI analysis. First, I introduce a Task-Specific Self-Supervised Learning (TS-SSL) framework that uses spatial-channel attention to learn domain-optimized feature representations, outperforming existing foundation models across multiple cancer classification benchmarks. Second, I present CEMIL, a contextual attention-based MIL framework that leverages instructor-learner knowledge distillation to classify cancer subtypes using only a fraction of WSI patches, achieving state-of-the-art accuracy with significantly reduced computational cost. Together, these methods address critical bottlenecks in generalization and efficiency for clinical-grade computational pathology.
About the Speaker
Tawsifur Rahman is a Ph.D. candidate in Biomedical Engineering at Johns Hopkins University, advised by Prof. Rama Chellappa and Dr. Alex Baras, with research focused on weakly supervised and self-supervised deep learning for computational pathology. He has completed two clinical data science internships at Johnson & Johnson MedTech and has published extensively in venues including Nature Modern Pathology, Nature Digital Medicine, MIDL, and IEEE WACV, accumulating over 8,500 citations and recognition in Stanford's Top 2% Scientists ranking.
Towards trustworthy AI under real world data challenges
The current paradigm of training AI models relies on fundamental assumptions that the data we have is clean, properly annotated, and sufficiently diverse across domains. However, this is not always true for the real world. In practice, data is may be physically corrupt, incompletely annotated, and specific to certain domains. As me move towards large scale general purpose models like LLMs and foundation models, it is even more important to address these data challenges so that we can train trustworthy AI models even with noisy real world data. In this presentation, we discuss some methods to tackle these potential issues.
About the Speaker
Ayush Gupta is a Ph.D. student at the AIEM lab, Johns Hopkins University in the department of Computer Science. He is advised by Prof. Rama Chellappa and is working on problems in Computer Vision and Deep Learning. His research has two focus points - general-purpose vision language models, where he works on multimodal LLMs on tasks like VQA, Video Grounding and LLM interpretability; and on fine-grained computer vision problems, where he works on person re-identification and gait recognition.
4 attendees from this group
Past events
82

