
What weāre about
š This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month weāll bring you two diverse speakers working at the cutting edge of computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
š£ Past Speakers
* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr München
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nürnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Åukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT
š Resources
* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links
Sponsors
See allUpcoming events (4+)
See all- Network event47 attendees from 16 groups hostingAug 15 - Visual Agent Workshop Part 1: Navigating the GUI Agent LandscapeLink visible for attendees
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 15, 2025 at 9 AM Pacific
Part 1: Navigating the GUI Agent Landscape
Understanding the Foundation Before Building
The GUI agent field is evolving rapidly, but success requires an understanding of what came before. In this opening session, we'll map the terrain of GUI agent researchāfrom the early days of MiniWoB's simplified environments to today's complex, multimodal systems tackling real-world applications. You'll discover why standard vision models fail catastrophically on GUI tasks, explore the annotation bottlenecks that make GUI datasets so expensive to create, and understand the platform fragmentation that makes "click a button" mean twenty different things across datasets.
We'll dissect the most influential datasets (Mind2Web, AITW, Rico) and models that have shaped the field, examining their strengths, limitations, and the research gaps they reveal. By the end, you'll have a clear picture of where GUI agents excel, where they struggle, and, most importantly, where the opportunities lie for your own contributions.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heās got a deep interest in RAG, Agents, and Multimodal AI.
- Network event39 attendees from 16 groups hostingAug 22 - Visual Agent Workshop Part 2: From Pixels to PredictionsLink visible for attendees
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 22, 2025 at 9 AM Pacific
Part 2: From Pixels to Predictions - Building Your GUI Dataset
Hands-On Dataset Creation and Curation with FiftyOne
The best GUI models are only as good as their training data, and the best datasets are built by understanding what makes GUI interactions fundamentally different from natural images. In this practical session, you'll build a complete GUI dataset from scratch, learning to capture the precise annotations that GUI agents need.
Using FiftyOne as your data management backbone, you'll import diverse GUI screenshots, explore annotation strategies that go beyond bounding boxes, and implement efficient labeling workflows. We'll tackle the real challenges: handling platform differences, managing annotation quality, and creating datasets that transfer to new domains. You'll also learn advanced techniques like synthetic data generation and automated prelabeling to scale your annotation efforts.
Walk away with a production-ready dataset and the skills to build moreābecause in GUI agents, data quality determines everything.
By the end, you'll have both a dataset and the methodology to build the next generation of GUI training data.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heās got a deep interest in RAG, Agents, and Multimodal AI.
- Network event34 attendees from 16 groups hostingAug 28 - AI, ML and Computer Vision MeetupLink visible for attendees
Date and Time
Aug 28, 2025 at 10 AM Pacific
Location
Virtual - Register for the Zoom
Exploiting Vulnerabilities In CV Models Through Adversarial Attacks
As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.
About the Speaker
Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation
Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.
In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.
About the Speaker
Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.
What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection
Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isnāt just about collecting LIDAR and camera frames. Itās about making sure every point lands where it should, in the right frame, at the right time.
In this talk, weāll break down what it actually takes to go from raw logs to a clean, usable AV dataset. Weāll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, weāll share a checklist of common failure points and hard-won debugging tips.
Finally, weāll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmosāenabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation
About the Speaker
Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.
- Network event23 attendees from 16 groups hostingAug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and ClickLink visible for attendees
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 29, 2025 at 9 AM Pacific
Part 3: Teaching Machines to See and Click - Model Finetuning
From Foundation Models to GUI Specialists
Foundation models, such as Qwen2.5-VL, demonstrate impressive visual understanding, but they require specialized training to master GUI interactions. In this final session, you'll transform a general-purpose vision-language model into a GUI specialist that can navigate interfaces with human-like precision.
We'll explore modern fine-tuning strategies specifically designed for GUI tasks, from selecting the right architecture to handling the unique challenges of coordinate prediction and multi-step reasoning. You'll implement training pipelines that can handle the diverse formats and platforms in your dataset, evaluate models on metrics that actually matter for GUI automation, and deploy your trained model in a real-world testing environment.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heās got a deep interest in RAG, Agents, and Multimodal AI.
Past events (133)
See all- Network event100 attendees from 16 groups hostingAugust 7 - Understanding Visual AgentsThis event has passed