
About us
🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
7
- Network event

March 26 - Advances in AI at Northeastern University
·OnlineOnline216 attendees from 48 groupsJoin us to hear about the latest advances in AI at Northeastern University!
Date, Time and Location
March 26, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!Scalable and Efficient Deep Learning: From Understanding to Generation
In an era where model complexity and deployment constraints increasingly collide, achieving both scalability and efficiency in deep learning has become essential. Scalable and efficient deep learning ensures that powerful models can be trained, deployed, and adapted under limited computational and data resources, enabling broader accessibility and practical application. From understanding to generation, this talk unifies methods that cut costs while preserving capability.
About the Speaker
Yitian Zhang is a fifth-year PhD student at Northeastern University, advised by Prof. Yun Raymond Fu. His research interests center around Efficient and Scalable AI, spanning Generative Models, Multimodal Large Language Models, and Foundation Models.
Grounding Visual AI Models in Real-World Physics
Generative video models have made rapid progress in visual realism, yet they frequently violate basic physical laws, producing implausible motion and incorrect cause-effect relationships. This talk presents MoReGen, a physics-grounded, agentic text-to-video generation framework that integrates Newtonian physics directly into the generation process via executable physics-engine code.
By coupling vision–language models with trajectory-based physical evaluation and iterative feedback, MoReGen produces videos that are both visually coherent and physically consistent. We further introduce MoRe Metrics and MoReSet, a benchmark and dataset designed to evaluate physics fidelity beyond appearance-based metrics such as FID and FVD. Together, this work demonstrates a path toward visual AI systems that reason about motion, interaction, and causality in the real world rather than hallucinating them.
About the Speakers
Professor Sarah Ostadabbas is an Associate Professor of Electrical and Computer Engineering at Northeastern University, where she directs the Augmented Cognition Lab (ACLab) and serves as Director of Women in Engineering. Her research focuses on computer vision and machine learning, with an emphasis on motion-centric representation learning, small-data AI, and applications in healthcare, defense, and behavior understanding under privacy and data constraints. She has authored over 130 peer-reviewed publications and received numerous honors, including the NSF CAREER Award, Sony Faculty Innovation Award, and the Cade Prize for Inventivity, along with multiple industry and federal research awards.
Xiangyu Bai is a third-year PhD student in the ACLab and leads the lab's work on physics-aware visual intelligence, with several publications in top-tier computer vision and robotics conferences.
WorldFormer: Diffusion Transformer World Models with Mixture-of-Experts for Embodied Physical Intelligence
World models have emerged as a foundational paradigm for enabling agents to simulate, predict, and reason about complex environments. Recent advances driven by diffusion transformer (DiT) architectures have dramatically expanded the fidelity, scalability, and physical plausibility of learned world models. In this work, we present a world model framework built upon the diffusion transformer paradigm, following the design philosophy of state-of-the-art systems such as NVIDIA Cosmos. Our approach comprises three core components: (1) a spatiotemporal variational autoencoder (VAE) that compresses high-resolution video into a compact continuous latent space with strong temporal causality, enabling efficient encoding and decoding of long-horizon video sequences; (2) a transformer-based diffusion backbone that operates on 3D-patchified latent tokens, leveraging self-attention and cross-attention with text embeddings to iteratively denoise Gaussian noise into physically coherent future video states using a flow matching objective; and (3) a scalable pre-training and post-training pipeline that first trains a generalist world foundation model on large-scale, diverse video data and then specializes it to target physical AI domains — such as robotic manipulation, autonomous driving, or embodied navigation — through task-specific fine-tuning.
Our model supports both text-to-world and video-to-world generation, enabling action-conditioned future state prediction for downstream planning and policy learning. We discuss implications for synthetic data generation, sim-to-real transfer, and the integration of world models into vision-language-action (VLA) pipelines for physical AI.
About the Speaker
Yanzhi Wang joined the Electrical & Computer Engineering department in August 2018 as an Assistant Professor. He earned his PhD at University of Southern California. His research interests include energy-efficient and high-performance implementations of deep learning and artificial intelligence systems; neuromorphic computing and non-von Neumann computing paradigms; cyber-security in deep learning systems; emerging deep learning algorithms/systems such as Bayesian neural networks, generative adversarial networks (GANs) and deep reinforcement learning.
Physical AI Research (PAIR) Center: Foundational Pairing of Digital Intelligence & Physical World Deployment at Northeastern University and Beyond
The Physical AI Research (PAIR) initiative advances the next frontier of artificial intelligence: enabling systems that can perceive, reason, and act reliably in the physical world. By uniting expertise across engineering, computer science, health sciences, and the social sciences, PAIR develops safe, transparent, and human-aligned AI that bridges digital models with real-world dynamics. The initiative is organized around three intellectual pillars: Learning and Modeling the World, through physics-informed multimodal learning, realistic simulations, and digital twins; Reasoning in the World, by integrating multimodal evidence to support grounded decision-making under uncertainty; and Acting in the World, by ensuring AI systems are verifiable, explainable, energy-efficient, and trustworthy. Together, these efforts position Physical AI as a foundational science driving innovation in health, sustainability, and security.
About the Speaker
Edmund Yeh is the Department Chair of Electrical and Computer Engineering at Northeastern University.
15 attendees from this group - Network event

April 2 - AI, ML and Computer Vision Meetup
·OnlineOnline251 attendees from 48 groupsJoin our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Date, Time and Location
Apr 2, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!Async Agents in Production: Failure Modes and Fixes
As models improve, we are starting to build long-running, asynchronous agents such as deep research agents and browser agents that can execute multi-step workflows autonomously. These systems unlock new use cases, but they fail in ways that short-lived agents do not.
The longer an agent runs, the more early mistakes compound, and the more token usage grows through extended reasoning, retries, and tool calls. Patterns that work for request-response agents often break down, leading to unreliable behaviour and unpredictable costs.
This talk is aimed at use case developers, with secondary relevance for platform engineers. It covers the most common failure modes in async agents and practical design patterns for reducing error compounding and keeping token costs bounded in production.
About the Speaker
Meryem Arik is the co-founder and CEO of Doubleword, where she works on large-scale LLM inference and production AI systems. She studied theoretical physics and philosophy at the University of Oxford. Meryem is a frequent conference speaker, including a TEDx speaker and a four-time highly rated speaker at QCon conferences. She was named to the Forbes 30 Under 30 list for her work in AI infrastructure.
Visual AI at the Edge: Beyond the Model
Edge-based visual AI promises low latency, privacy, and real-time decision-making, but many projects struggle to move beyond successful demos. This talk explores what deploying visual AI at the edge really involves, shifting the focus from models to complete, operational systems. We will discuss common pitfalls teams encounter when moving from lab to field. Attendees will leave with a practical mental model for approaching edge vision projects more effectively.
About the Speaker
David Moser is an AI/Computer Vision expert and Founding Engineer with a strong track record of building and deploying safety-critical visual AI systems in real-world environments. As Co-Founder of Orella Vision, he is building Visual AI for Autonomy on the Edge - going from data and models to production-grade edge deployments.
Sanitizing Evaluation Datasets: From Detection to Correction
We generally accept that gold standard evaluation sets contain label noise, yet we rarely fix them because the engineering friction is too high. This talk presents a workflow to operationalize ground-truth auditing. We will demonstrate how to bridge the gap between algorithmic error detection and manual rectification. Specifically, we will show how to inspect discordant ground truth labels and correct them directly in-situ. The goal is to move to a fully trusted end-to-end evaluation pipeline.
About the Speaker
Nick Lotz is an engineer on the Voxel51 community team. With a background in open source infrastructure and a passion for developer enablement, Nick focuses on helping teams understand their tools and how to use them to ship faster.
Building enterprise agentic systems that scale
Building AI agents that work in demos is easy, building true assistants that make people genuinely productive takes a different set of patterns. This talk shares lessons from a multi-agent system at Cisco used by 2,000+ sellers daily, where we moved past "chat with your data" to encoding business workflows into true agentic systems people actually rely on to get work done.
We'll cover multi-agent orchestration patterns for complex workflows, the personalization and productivity features that drive adoption, and the enterprise foundations that helped us earn user trust at scale. You'll leave with an architecture and set of patterns that have been battle tested at enterprise scale.
About the Speaker
Aman Sardana is a Senior Engineering Architect at Cisco, I lead the design and deployment of enterprise AI systems that blend LLMs, data infrastructure, and customer experience to solve high‑stakes, real-world problems at scale. I’m also an open-source contributor and active mentor in the AI community, helping teams move from AI experimentation to reliable agentic applications in production.
12 attendees from this group - Network event

April 8 - Getting Started with FiftyOne
·OnlineOnline43 attendees from 48 groupsThis workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.
Date, Time and Location
Apr 8, 2026
10 AM PST - 11 AM Pacific
Online. Register for the Zoom!The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.
What you'll learn
- Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
- Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
- Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
- Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
- Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
- Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.
Prerequisites:
- Working knowledge of Python and machine learning and/or computer vision fundamentals.
- All attendees will get access to the tutorials and code examples used in the workshop.
1 attendee from this group - Network event

April 9 - Workshop: Build a Visual Agent that can Navigate GUIs like Humans
·OnlineOnline192 attendees from 48 groupsThis hands-on workshop provides a comprehensive introduction to building and evaluating visual agents for GUI automation using modern tools and techniques.
Date, Time and Location
April 9, 2026 at 9 AM Pacific
Online. Register for the ZoomVisual agents that can understand and interact with graphical user interfaces represent a transformative frontier in AI automation. These systems combine computer vision, natural language understanding, and spatial reasoning to enable machines to navigate complex interfaces—from web applications to desktop software—just as humans do. However, building robust GUI agents requires careful attention to dataset curation, model evaluation, and iterative improvement workflows.
Participants will learn how to leverage FiftyOne, an open-source toolkit for dataset curation and computer vision workflows, to build production-ready GUI agent systems.
What You'll Learn:
- Dataset Creation & Management: How to structure, annotate, and load GUI interaction datasets using the COCO4GUI standardized format
- Data Exploration & Analysis: Using FiftyOne's interactive interface to visualize datasets, analyze action distributions, and understand annotation patterns
- Multimodal Embeddings: Computing embeddings for screenshots and UI element patches to enable similarity search and retrieval
- Model Inference: Running state-of-the-art models like Microsoft's GUI-Actor to predict interaction points from natural language instructions
- Performance Evaluation: Measuring model accuracy using standard metrics and normalized click distance to assess localization precision
- Failure Analysis: Investigating model failures through attention maps, error pattern analysis, and systematic debugging workflows
- Data-Driven Improvement: Tagging samples based on error types (attention misalignment vs. localization errors) to prioritize fine-tuning efforts
- Synthetic Data Generation: Using FiftyOne plugins to augment training data with synthetic task descriptions and variations
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
11 attendees from this group
Past events
216

