
About us
š This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month weāll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events
11
- Network event

May 14 - AI, ML and Computer Vision Meetup
Ā·OnlineOnline410 attendees from 48 groupsJoin our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Day, Time, Location
May 14, 2026
9:00-11:00 AM Pacific
Online. Register for the Zoom!Concept-Aware Batch Sampling Improves Language-Image Pretraining
What data should a vision-language model be trained on, and who gets to decide what āgood dataā even means? Most existing curation pipelines are limited because they are offline (they produce a static dataset from a set of predetermined filtering criteria) and concept-agnostic (they rely on model-based scores that can silently introduce new biases in what concepts the model sees). In this talk, I will discuss our new work CABS that tackles both these problems with large-scale sample-level concept annotations and flexible online batch sampling.
First, we construct DataConcept, a 128M web-crawled imageātext collection annotated with fine-grained concept composition, and show how this enables Concept-Aware Batch Sampling (CABS)āa simple online method that constructs training batches on-the-fly to match target concept distributions. We develop two variants, CABS-DM for maximizing concept coverage and CABS-FM for prioritizing high object multiplicity, and demonstrate consistent gains for CLIP/SigLIP-style models across 28 benchmarks.
Finally, Iāll show that these improvements translate into strong vision encoders for training generative multimodal models, including autoregressive systems like LLaVA, where the encoder quality materially affects downstream capability.
About the Speaker
Adhiraj Ghosh is a first year ELLIS PhD student, working with Matthias Bethge at The University of Tübingen. He completed his undergraduate degree in Electrical and Electronics Engineering jointly at the Manipal Institute of Technology and SMU Singapore from 2016 to 2020, and his masters in Machine Learning at The University of Tübingen from 2022 to 2024.
Do Your Agents Actually Work? Measuring Skills and MCP in Practice
This talk shows how to evaluate agent performance in real scenarios using FiftyOne Skills and MCP. We will cover practical ways to design scenarios, run agents, and measure how they use tools, including signals like latency, token usage, and output quality. The goal is to move beyond final outputs and better understand agent behavior, helping teams build more reliable and measurable agent systems.
About the Speaker
Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.
The last mile of OCR [in 2026]
OCR is nailing it in benchmarks but the real work lies in The Long Tail of IDP. Large tables, old scans, mixed-language docs, handwriting, complex layouts is where most enterprise and real-world document work happens. This is where the best benchmarked models still struggle. In this talk, we will go through how LandingAIās Agentic Document Extraction (ADE) goes beyond OCR and Parsing to enable real-world document AI use cases and workloads.
We'll cover:
- The pillars of Agentic Document Extraction
- Building document processing pipelines with ADE API/SDK
- Using Skills to have Coding Agents build for you
- How ADE gives LLMs the last mile - Analysing LLM performance on large table, scanned docs, complex layouts and enabling them with the structured output from ADE
About the Speaker
Ankit Khare has been building Developer Relations function at high-growth startups like Rockset (a world-class retrieval system, later acquired by OpenAI), Twelve Labs (a video intelligence startup backed by Index Ventures, Radical Ventures, and NEA), and Abacus.AI (an AI Super Assistant backed by Index Ventures, Eric Schmidt, and Ram Shriram). Before that, he was an AI engineer at third insight and an AI researcher at the LEARN Lab at UT-Arlington, working on visual scene understanding and image captioning agents.
The Energy Layer of AI: Powering the Next Wave of Inference
The talk explores how inference cost fundamentally ties to energy at scale, especially as the AI industry shifts toward always-on, agent-driven workloads, and the focus moved from training to inference economics. Medi will share lessons and observations from his team's R&D efforts in making AI workloads grid-aware, energy-intelligent, and dynamically optimized in real time.
About the Speaker
Medi Naseri is the Founder and CEO of LÅD Technologies, where he leads the development of energy-intelligent infrastructure for flexible data centers and the broader compute ecosystem.
With a PhD in Electrical Engineering specializing in control and power systems, Medi brings deep technical expertise to the challenge of scaling AI within real-time grid constraints.1 attendee from this group - Network event

May 20 - Getting Started with FiftyOne
Ā·OnlineOnline97 attendees from 48 groupsThis workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.
Date, Time and Location
May 20, 2026
10 AM PST - 11 AM Pacific
Online. Register for the Zoom!The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.
What you'll learn
- Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
- Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
- Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
- Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
- Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
- Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.
Prerequisites:
- Working knowledge of Python and machine learning and/or computer vision fundamentals.
- All attendees will get access to the tutorials and code examples used in the workshop.
1 attendee from this group - Network event

May 21 - Women in AI Meetup
Ā·OnlineOnline222 attendees from 48 groupsHear talks from experts on the latest topics in AI, ML, and computer vision on May 21.
Date, Time and Location
May 21, 2026
9 - 11 AM pacific
Online. Register for the Zoom!Beyond Models: LLM-Guided Reinforcement Learning for Real-World Wireless Systems
Reinforcement learning agents often perform well in simulation but break down when deployed in real, non-stationary, constraint-driven environments such as wireless systems. This work explores using large language models not as annotators or reward hacks, but as a reasoning layer that guides RL decision-making with domain logic, scenario interpretation, and adaptive constraints.
We present an architecture where the LLM provides structured, high-level advisory signals while the RL policy remains the final action authority to avoid hallucination-driven failures. Early experiments show that this hybrid setup improves robustness under distribution shifts and complex constraint scenarios where standard RL collapses. The goal is not to replace RL with LLMs, but to combine learning and reasoning into a more deployable control-intelligence framework.
About the Speaker
Fatemeh Lotfi is a Ph.D. researcher focused on integrating large language models and reinforcement learning for adaptive wireless control systems. Her work targets the limitations of classical RL under real-world uncertainty by introducing reasoning-driven guidance mechanisms using LLMs. She has contributed to multiple AI-for-infrastructure projects, including advanced O-RAN automation.
Responsible and Ethical AI in Healthcare: Building Trustworthy and Inclusive Intelligent Systems
In this session, I will discuss how Responsible AI principles: including fairness, transparency, accountability, and reliability can be practically embedded into healthcare AI systems. Key discussion points will include:
- Addressing bias and equity challenges in healthcare datasets and model training.
- Building explainable and interpretable AI to strengthen clinician trust and adoption.
- Ensuring ethical deployment of generative AI models within regulated healthcare environments.
- Establishing governance frameworks for data privacy, model monitoring, and regulatory compliance.
About the Speaker
Jahnavi Kachhia is the Global Product Owner, AI & ML at Abbott, leading large-scale AI initiatives for the FreeStyle Libre platform to enhance clinical decision-making and patient outcomes. Previously at Metaās Reality Labs, she advanced AR/VR innovation and LLM-based intelligent systems. An active contributor to the AI research community, she serves on the IJCAI 2025 Program Committee and reviews for AAAI, IJCNN, and IEEE conferences.
AI Applications in Drug Repurposing
Drug repurposing is increasingly important because it offers a faster, lower-cost path to therapeutic discovery compared to de novo drug development, especially in oncology where many cancers still lack effective targeted options. In under-studied cancers such as endometrial cancer, the challenge is often a lack of large, high-quality clinical or response datasets, making purely data-dependent approaches difficult to scale reliably. This motivates combining data-independent strategies (e.g., pathway- and mechanism-driven modeling) with data-dependent learning when interaction evidence is available. A practical and scalable direction is drugātarget interaction (DTI) prediction, where AI models can leverage molecular and protein representations to prioritize mechanistically plausible drug candidates for repurposing.
About the Speaker
Madhurima Mondal's academic journey has been shaped by strong foundations in mathematical and scientific problem-solving, including multiple national-level achievements such as Regional Mathematics Olympiad (RMO), NTSE, and the KVPY fellowship. She completed my B.Tech and M.Tech in Electronics & Electrical Communication Engineering from IIT Kharagpur, and I am currently a PhD candidate in Electrical & Computer Engineering at Texas A&M University,
Mapping to Belonging: How Ethically Governed AI Can Make Real Places More Accessible, Legible, and Human
Can AI help people belong in the places where they live, work, travel, and get together?
This talk explores that question through real-world work at the intersection of accessibility, computer vision mapping, civic data, and ethically governed AI. I will show how AI can support the collection and interpretation of pedestrian accessibility data, reduce the burden of documenting barriers, and help transform lived experience into structured information that can be used across routing tools, planning systems, and public decision-making. I will also argue that public-interest AI only works when it is governed well. In accessibility work, the risks are clear: over-averaging, hidden bias, false completeness, and systems that optimize for efficiency while overlooking the people most affected by missing or poor-quality data. Ethically governed AI must therefore be designed to preserve local context, support transparency, include community participation, and make room for experiences that conventional systems often ignore.
About the Speaker
Anat Caspi is Director of the Taskar Center for Accessible Technology at the University of Washington, where she leads research and public-interest technology efforts focused on accessibility, mobility, and inclusive transportation data.
1 attendee from this group - Network event

May 27 - Perceptron AI and FiftyOne for Video Understanding Workshop
Ā·OnlineOnline56 attendees from 48 groupsJoin us for a hands-on virtual session on May 27 exploring video-native multimodal AI and how to integrate cutting-edge video understanding models into your computer vision workflows.
Date, Time and Location
May 27, 2026
9:00 AM - 11:00 AM PST
Online. Register for Zoom!Video-Native Multimodal Models for Video and Image Understanding
In this 20-minute talk, Akshat will introduce Perceptronās latest release, a video-native multimodal model that matches or exceeds frontier models from Google and Alibaba on video and image understanding at a fraction of their inference cost. Heāll walk through the capabilities that move the needle for real video workloads: temporal grounding to clip precise events from long streams, egocentric reasoning for first-person and wearable contexts, and structured āthinking tracesā that reason over motion and physical space. Heāll also cover the image-side advances production perception teams care about: reliable pointing, point-by-example one-shot visual search, dense counting, dial/gauge/clock reading, and structured document extraction.
About the Speaker
Akshat Shrivastava is the CTO and co-founder of Perceptron, previously leading AR On-Device at Meta and conducting research at UW.
Getting Started with Perceptron AI in FiftyOne
In the second half of the session, Harpreet Sahota will walk through how to get started using Perceptronās video-native multimodal model within FiftyOne for real-world video understanding workflows. Heāll demonstrate how to connect to the API, explore multimodal outputs inside FiftyOne, and build practical workflows for tasks like temporal event analysis, visual search, and video dataset inspection. Attendees will leave with a hands-on understanding of how to integrate state-of-the-art video perception models into their existing computer vision pipelines.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heās got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.
1 attendee from this group
Past events
231

