Part of Computer Vision Meetups - 16 groups

Boston Computer Vision Meetup

4.7•18 ratings

Boston, MA, US

736 members · Public group

What we’re about

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Contact the Meetup organizers!

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone

📣 Past Speakers

* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr München
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nürnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Łukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT

📚 Resources

* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links

Upcoming events

See all

Network event
Tue, Dec 16 · 12:00 PM EST
•
Online
Dec 16 - Building and Auditing Physical AI Pipelines with FiftyOne
Online
26 attendees from 16 groups
This hands-on workshop introduces you to the Physical AI Workbench, a new layer of FiftyOne designed for autonomous vehicle, robotics, and 3D vision workflows. You’ll learn how to bridge the gap between raw sensor data and production-quality datasets, all from within FiftyOne’s interactive interface.

Date, Time and Location

Dec 16, 2025
9:00-10:00 AM Pacific
Online. Register for the Zoom!

Through live demos, you’ll explore how to:
- Audit: Automatically detect calibration errors, timestamp misalignments, incomplete frames, and other integrity issues that arise from dataset format drift over time.
- Generate: Reconstruct and augment your data using NVIDIA pathways such as NuRec, COSMOS, and Omniverse, enabling realistic scene synthesis and physical consistency checks.
- Enrich: Integrate auto-labeling, embeddings, and quality scoring pipelines to enhance metadata and accelerate model training.
- Export and Loop Back: Seamlessly export to and re-import from interoperable formats like NCore to verify consistency and ensure round-trip fidelity.
You’ll gain hands-on experience with a complete physical AI dataset lifecycle—from ingesting real-world AV datasets like nuScenes and Waymo, to running 3D audits, projecting LiDAR into image space, and visualizing results in FiftyOne’s UI. Along the way, you’ll see how Physical AI Workbench automatically surfaces issues in calibration, projection, and metadata—helping teams prevent silent data drift and ensure reliable dataset evolution.

By the end, you’ll understand how the Physical AI Workbench standardizes the process of building calibrated, complete, and simulation-ready datasets for the physical world.

Who should attend

Data scientists, AV/ADAS engineers, robotics researchers, and computer vision practitioners looking to standardize and scale physical-world datasets for model development and simulation.

About the Speaker

Daniel Gural leads technical partnerships at Voxel51, where he’s building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.
Network event
Tue, Jan 13, 2026 · 12:00 PM EST
•
Online
Jan 13 - Designing Data Infrastructures for Multimodal Mobility Datasets
Online
32 attendees from 16 groups
This technical workshop focuses on the data infrastructure required to build and maintain production-grade mobility datasets at fleet scale.

Date, Time and Location

Jan 13, 2026
9:00-10:00 AM Pacific
Online. Register for the Zoom!

We will examine how to structure storage, metadata, access patterns, and quality controls so that mobility teams can treat perception datasets as first-class, versioned “infrastructure” assets. The session will walk through how to design a mobility data stack that connects object storage, labeling systems, simulation environments, and experiment tracking into a coherent, auditable pipeline.

What you’ll learn:
- Model the mobility data plane: Define schemas for camera, LiDAR, radar, and HD, and represent temporal windows, ego poses, and scenario groupings in a way that is queryable and stable under schema evolution.
- Build a versioned dataset catalog with FiftyOne: Use FiftyOne customized workspaces and views to represent canonical datasets, and integrate with your raw data sources. All while preserving lineage between raw logs, the curated data, and simulation inputs.
- Implement governance and access control on mobility data: Configure role-based access and auditable pipelines to enforce data residency constraints while encouraging multi-team collaboration across research, perception, and safety functions.
- Operationalize curation and scenario mining workflows: Use FiftyOne’s embeddings and labeling capabilities to surface rare events such as adverse weather and sensor anomalies. Assign review tasks, and codify “critical scenario” definitions as reproducible dataset views.
- Close the loop with evaluation and feedback signals: Connect FiftyOne to training and evaluation pipelines so that model failures feed back into dataset updates
By the end of the workshop, attendees will have a concrete mental model and reference architecture for treating mobility datasets as a governed, queryable, and continuously evolving layer in their stack.
1 attendee from this group
Network event
Wed, Jan 14, 2026 · 12:00 PM EST
•
Online
Jan 14 - Best of NeurIPS
Online
35 attendees from 16 groups
Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025
9 AM Pacific
Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.
1 attendee from this group
Network event
Thu, Jan 22, 2026 · 12:00 PM EST
•
Online
Jan 22 - Women in AI
Online
14 attendees from 16 groups
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs.
While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored.
This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat ~ Ethical AI governance advocate, pioneering AI frameworks that prioritize emergent AI behavior & consciousness, R&D, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.
1 attendee from this group