
About us
đ This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month weâll bring you two diverse speakers working at the cutting edge of computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
đŁ Past Speakers
* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr Mßnchen
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nßrnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Ĺukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT
đ Resources
* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links
Upcoming events
10
- Network event

April 2 - AI, ML and Computer Vision Meetup
¡OnlineOnline36 attendees from 16 groupsJoin our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Date, Time and Location
Apr 2, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!Async Agents in Production: Failure Modes and Fixes
As models improve, we are starting to build long-running, asynchronous agents such as deep research agents and browser agents that can execute multi-step workflows autonomously. These systems unlock new use cases, but they fail in ways that short-lived agents do not.
The longer an agent runs, the more early mistakes compound, and the more token usage grows through extended reasoning, retries, and tool calls. Patterns that work for request-response agents often break down, leading to unreliable behaviour and unpredictable costs.
This talk is aimed at use case developers, with secondary relevance for platform engineers. It covers the most common failure modes in async agents and practical design patterns for reducing error compounding and keeping token costs bounded in production.
About the Speaker
Meryem Arik is the co-founder and CEO of Doubleword, where she works on large-scale LLM inference and production AI systems. She studied theoretical physics and philosophy at the University of Oxford. Meryem is a frequent conference speaker, including a TEDx speaker and a four-time highly rated speaker at QCon conferences. She was named to the Forbes 30 Under 30 list for her work in AI infrastructure.
Visual AI at the Edge: Beyond the Model
Edge-based visual AI promises low latency, privacy, and real-time decision-making, but many projects struggle to move beyond successful demos. This talk explores what deploying visual AI at the edge really involves, shifting the focus from models to complete, operational systems. We will discuss common pitfalls teams encounter when moving from lab to field. Attendees will leave with a practical mental model for approaching edge vision projects more effectively.
About the Speaker
David Moser is an AI/Computer Vision expert and Founding Engineer with a strong track record of building and deploying safety-critical visual AI systems in real-world environments. As Co-Founder of Orella Vision, he is building Visual AI for Autonomy on the Edge - going from data and models to production-grade edge deployments.
Sanitizing Evaluation Datasets: From Detection to Correction
We generally accept that gold standard evaluation sets contain label noise, yet we rarely fix them because the engineering friction is too high. This talk presents a workflow to operationalize ground-truth auditing. We will demonstrate how to bridge the gap between algorithmic error detection and manual rectification. Specifically, we will show how to inspect discordant ground truth labels and correct them directly in-situ. The goal is to move to a fully trusted end-to-end evaluation pipeline.
About the Speaker
Nick Lotz is an engineer on the Voxel51 community team. With a background in open source infrastructure and a passion for developer enablement, Nick focuses on helping teams understand their tools and how to use them to ship faster.
Building enterprise agentic systems that scale
Building AI agents that work in demos is easy, building true assistants that make people genuinely productive takes a different set of patterns. This talk shares lessons from a multi-agent system at Cisco used by 2,000+ sellers daily, where we moved past "chat with your data" to encoding business workflows into true agentic systems people actually rely on to get work done.
We'll cover multi-agent orchestration patterns for complex workflows, the personalization and productivity features that drive adoption, and the enterprise foundations that helped us earn user trust at scale. You'll leave with an architecture and set of patterns that have been battle tested at enterprise scale.
About the Speaker
Aman Sardana is a Senior Engineering Architect at Cisco, I lead the design and deployment of enterprise AI systems that blend LLMs, data infrastructure, and customer experience to solve highâstakes, real-world problems at scale. Iâm also an open-source contributor and active mentor in the AI community, helping teams move from AI experimentation to reliable agentic applications in production.
2 attendees from this group - Network event

April 8 - Getting Started with FiftyOne
¡OnlineOnline15 attendees from 16 groupsThis workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.
Date, Time and Location
Apr 8, 2026
10 AM PST - 11 AM Pacific
Online. Register for the Zoom!The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.
What you'll learn
- Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
- Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
- Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
- Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
- Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
- Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.
Prerequisites:
- Working knowledge of Python and machine learning and/or computer vision fundamentals.
- All attendees will get access to the tutorials and code examples used in the workshop.
2 attendees from this group - Network event

April 9 - Workshop: Build a Visual Agent that can Navigate GUIs like Humans
¡OnlineOnline34 attendees from 16 groupsThis hands-on workshop provides a comprehensive introduction to building and evaluating visual agents for GUI automation using modern tools and techniques.
Date, Time and Location
April 9, 2026 at 9 AM Pacific
Online. Register for the ZoomVisual agents that can understand and interact with graphical user interfaces represent a transformative frontier in AI automation. These systems combine computer vision, natural language understanding, and spatial reasoning to enable machines to navigate complex interfacesâfrom web applications to desktop softwareâjust as humans do. However, building robust GUI agents requires careful attention to dataset curation, model evaluation, and iterative improvement workflows.
Participants will learn how to leverage FiftyOne, an open-source toolkit for dataset curation and computer vision workflows, to build production-ready GUI agent systems.
What You'll Learn:
- Dataset Creation & Management: How to structure, annotate, and load GUI interaction datasets using the COCO4GUI standardized format
- Data Exploration & Analysis: Using FiftyOne's interactive interface to visualize datasets, analyze action distributions, and understand annotation patterns
- Multimodal Embeddings: Computing embeddings for screenshots and UI element patches to enable similarity search and retrieval
- Model Inference: Running state-of-the-art models like Microsoft's GUI-Actor to predict interaction points from natural language instructions
- Performance Evaluation: Measuring model accuracy using standard metrics and normalized click distance to assess localization precision
- Failure Analysis: Investigating model failures through attention maps, error pattern analysis, and systematic debugging workflows
- Data-Driven Improvement: Tagging samples based on error types (attention misalignment vs. localization errors) to prioritize fine-tuning efforts
- Synthetic Data Generation: Using FiftyOne plugins to augment training data with synthetic task descriptions and variations
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heâs got a deep interest in RAG, Agents, and Multimodal AI.
2 attendees from this group - Network event

April 23 - Advances in AI at Johns Hopkins University
¡OnlineOnline0 attendees from 10 groupsJoin our virtual Meetup to hear talks from researchers at Johns Hopkins University on cutting-edge AI topics.
Date, Time and Location
Apr 23, 2026
9AM PST
Online. Register for the Zoom!Recent Advancements in Image Generation and Understanding
In this talk, I will provide an overview of my research and then take a closer look at three recent works. Image generation has progressed rapidly in the past decade-evolving from Gaussian Mixture Models (GMMs) to Variational Autoencoders (VAEs), GANs, and more recently diffusion models, which have set new standards for quality. I will begin with DiffNat (TMLRâ25), which draws inspiration from a simple yet powerful observation: the kurtosis concentration property of natural images. By incorporating a kurtosis concentration loss together with a perceptual guidance strategy, DiffNat can be plugged directly into existing diffusion pipelines, leading to sharper and more faithful generations across tasks such as personalization, super-resolution, and unconditional synthesis.
Continuing the theme of improving quality under constraints, I will then discuss DuoLoRA (ICCVâ25), which tackles the challenge of contentâstyle personalization from just a few examples. DuoLoRA introduces adaptive-rank LoRA merging with cycle-consistency, allowing the model to better disentangle style from content. This not only improves personalization quality but also achieves it with 19Ă fewer trainable parameters, making it far more efficient than conventional merging strategies.
Finally, I will turn to Cap2Aug (WACVâ25), which directly addresses data scarcity. This approach uses captions as a bridge for semantic augmentation, applying cross-modal backtranslation (image â text â image) to generate diverse synthetic samples. By aligning real and synthetic distributions, Cap2Aug boosts both few-shot and long-tail classification performance on multiple benchmarks.
About the Speaker
Aniket Roy is currently a Research Scientist at NEC Labs America. He recently earned a PhD from the Computer Science department at Johns Hopkins University under the guidance of Bloomberg Distinguished Professor Prof. Rama Chellappa.
From Representation Analysis to Data Refinement: Understanding Correlations in Deep Models
This talk examines how deep learning models encode information beyond their intended objectives and how such dependencies influence reliability, fairness, and generalization. Representation-level analysis using mutual informationâbased expressivity estimation is introduced to quantify the extent to which attributes such as demographics or anatomical structural factors are implicitly captured in learned embeddings, even when they are not explicitly used for supervision. These analyses reveal hierarchical patterns of attribute encoding and highlight how correlated factors emerge across layers. Data attribution techniques are then discussed to identify influential training samples that contribute to model errors and reinforce dependencies that reduce robustness. By auditing the training data through influence estimation, harmful instances can be identified and removed to improve model behavior. Together, these components highlight a unified, data-centric perspective for analyzing and refining correlations in deep models.
About the Speaker
Basudha Pal is a recent PhD graduate from the Electrical and Computer Engineering Department at Johns Hopkins University. Her research lies at the intersection of computer vision and representation learning, focusing on understanding and refining correlations in deep neural network representations for biometric and medical imaging using mutual information analysis, data attribution, and generative modeling to improve robustness, fairness, and reliability in high-stakes AI systems.
Scalable & Precise Histopathology: Next-Gen Deep Learning for Digital Histopathology
Whole slide images (WSIs) present a unique computational challenge in digital pathology, with single images reaching gigapixel resolution, equivalent to 500+ photos stitched together. This talk presents two complementary deep learning solutions for scalable and accurate WSI analysis. First, I introduce a Task-Specific Self-Supervised Learning (TS-SSL) framework that uses spatial-channel attention to learn domain-optimized feature representations, outperforming existing foundation models across multiple cancer classification benchmarks. Second, I present CEMIL, a contextual attention-based MIL framework that leverages instructor-learner knowledge distillation to classify cancer subtypes using only a fraction of WSI patches, achieving state-of-the-art accuracy with significantly reduced computational cost. Together, these methods address critical bottlenecks in generalization and efficiency for clinical-grade computational pathology.
About the Speaker
Tawsifur Rahman is a Ph.D. candidate in Biomedical Engineering at Johns Hopkins University, advised by Prof. Rama Chellappa and Dr. Alex Baras, with research focused on weakly supervised and self-supervised deep learning for computational pathology. He has completed two clinical data science internships at Johnson & Johnson MedTech and has published extensively in venues including Nature Modern Pathology, Nature Digital Medicine, MIDL, and IEEE WACV, accumulating over 8,500 citations and recognition in Stanford's Top 2% Scientists ranking.
1 attendee from this group
Past events
172



