Skip to content

About us

This group is for sharing ideas and experience in the field of computer vision from both industry and academic experts.
Join to share your inspiring ideas, connect, and create new opportunities within members.

Sponsors

Versatile

Versatile

Hosting April 2021 event

Cloudinary

Cloudinary

Sponsoring Sep 2018 meetup

Healthy.io

Healthy.io

Sponsoring Aug 2018 meetup

LEO pharma

LEO pharma

Sponsoring our July 2018 meetup

Upcoming events

9

See all
  • Network event
    Feb 26 - Exploring Video Datasets with FiftyOne and Vision-Language Models

    Feb 26 - Exploring Video Datasets with FiftyOne and Vision-Language Models

    ·
    Online
    Online
    139 attendees from 48 groups

    Join Harpreet Sahota for a virtual workshop to learn how to use Facebook's Action100M dataset and FiftyOne to build an end-to-end workflow.

    Date, Time and Location

    Feb 26, 2026
    9am - 10am Pacific
    Online. Register for the Zoom!

    Video is the hardest modality to work with. You're dealing with more data, temporal complexity, and annotation workflows that don't scale. This hands-on workshop tackles a practical question: given a large video dataset, how do you understand what's in it without manually watching thousands of clips?

    In this workshop you'll learn how to:

    • Navigate and explore video data in the FiftyOne App, filter samples, and understand dataset structure
    • Compute embeddings with Qwen3-VL to enable semantic search, zero-shot classification, and clustering
    • Generate descriptions and localize events using vision-language models like Qwen3-VL and Molmo2
    • Visualize patterns in your data through embedding projections and the FiftyOne App
    • Evaluate model outputs against Action100M's hierarchical annotations to validate what the models actually capture

    By the end of the session, you'll have a reusable toolkit for understanding any video dataset at scale, whether you're curating training data, debugging model performance, or exploring a new domain.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    7 attendees from this group
  • Network event
    March 12 - Agents, MCP and Skills Virtual Meetup

    March 12 - Agents, MCP and Skills Virtual Meetup

    ·
    Online
    Online
    349 attendees from 48 groups

    Join us for a special edition of the AI, ML and Computer Vision Meetup where we will focus on Agents, MCP and Skills!

    Date, Time, Location

    Mar 12, 2026
    9 - 11 AM PST
    Online.
    Register for the Zoom!

    Agents Building Agents on the Hugging Face Hub

    Discover how coding agents can run or support your fine-tuning experiments. From quick dataset validation and preprocessing, to optimal GPU hardware selection, to automated job submission based on metric tracking, to evaluation. Ben will demonstrate how Hugging Face skills can be used to define best practices for agents to support machine learning experiments. Bring Claude, Codex, or Mistral Vibes, and we’ll show you to get it training models with GRPO, SFT, and DPO.

    About the Speaker

    Ben Burtenshaw is a Machine Learning Engineer at Hugging Face, focusing on building agents with fine-tuning and reinforcement learning. He led educational projects like the Agents Course, the MCP Course, and the LLM course, which bridge the gap between complex Reinforcement Learning (RL) techniques and practical application. Ben focuses on democratizing access to efficient AI, empowering the community to align, evaluate, and deploy transparent agentic systems.

    Claude Code Templates

    This talk explores how to configure and align Claude Code agents using templates and custom components. I'll demonstrate practical configuration patterns that ensure your CLI agent executes exactly what you intend, covering Skills setup, hooks implementation, and template customization. Drawing from real-world examples building Claude Code Templates, attendees will learn how to structure their agent configurations for consistent, reliable behavior and create reusable components that maintain alignment across different use cases.

    About the Speaker

    Daniel Avila is an AI Engineer at Hedgineer building agentic systems and creator of Claude Code Templates.

    Move Faster in Computer Vision by Teaching Agents to See Your Data

    Computer vision teams spend too much time writing scripts just to find bad labels, blurry images, and edge cases. In this talk, I’ll show how to move that work to agents by using FiftyOne as a visual operating system. With Skills and MCP, agents can see inside your datasets, explore them visually, and handle common data cleanup tasks, so you can spend less time on data and more time shipping models.

    About the Speaker

    Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV. I started as a software developer, moved into AI, led teams, and served as CTO. Today, I connect code and community to build open, production-ready AI, making technology simple, accessible, and reliable.

    Skills As Documentation

    Skills are self-contained recipes - each one a piece of a larger puzzle. Instead of trying to modify human-centric documentation to better fit agents, skills let us build capabilities into our agents directly. This talk will showcase how to think about leveraging skills to enhance how users interact with your software!

    About the Speaker

    Chris Alexiuk is a deep learning developer advocate at NVIDIA, working on creating technical assets that help developers use the incredible suite of AI tools available at NVIDIA. Chris comes from a machine learning and data science background, and he is obsessed with everything and anything about large language models.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    24 attendees from this group
  • Network event
    March 18 - Vibe Coding Production-Ready Computer Vision Pipelines Workshop

    March 18 - Vibe Coding Production-Ready Computer Vision Pipelines Workshop

    ·
    Online
    Online
    186 attendees from 48 groups

    Join us for an interactive workshop where we'll build production-ready computer vision pipelines using vibe coded FiftyOne plugins.

    Register for the Zoom

    Plugins enable you to customize the open-source FiftyOne computer vision app to match your exact workflow by easily incorporating data annotation, curation, model evaluation and inference.

    We'll demonstrate how FiftyOne Skills and the MCP Server can streamline the journey from prototype to production-ready pipelines, keeping your development flow intact.

    Perfect for open-source contributors, researchers, and enterprise teams seeking hands-on experience. All participants receive slides, notebooks, and access to GitHub repositories and videos from the workshop.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    19 attendees from this group
  • Network event
    April 2 - AI, ML and Computer Vision Meetup

    April 2 - AI, ML and Computer Vision Meetup

    ·
    Online
    Online
    161 attendees from 48 groups

    Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Date, Time and Location

    Apr 2, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    Async Agents in Production: Failure Modes and Fixes

    As models improve, we are starting to build long-running, asynchronous agents such as deep research agents and browser agents that can execute multi-step workflows autonomously. These systems unlock new use cases, but they fail in ways that short-lived agents do not.

    The longer an agent runs, the more early mistakes compound, and the more token usage grows through extended reasoning, retries, and tool calls. Patterns that work for request-response agents often break down, leading to unreliable behaviour and unpredictable costs.

    This talk is aimed at use case developers, with secondary relevance for platform engineers. It covers the most common failure modes in async agents and practical design patterns for reducing error compounding and keeping token costs bounded in production.

    About the Speaker

    Meryem Arik is the co-founder and CEO of Doubleword, where she works on large-scale LLM inference and production AI systems. She studied theoretical physics and philosophy at the University of Oxford. Meryem is a frequent conference speaker, including a TEDx speaker and a four-time highly rated speaker at QCon conferences. She was named to the Forbes 30 Under 30 list for her work in AI infrastructure.

    Visual AI at the Edge: Beyond the Model

    Edge-based visual AI promises low latency, privacy, and real-time decision-making, but many projects struggle to move beyond successful demos. This talk explores what deploying visual AI at the edge really involves, shifting the focus from models to complete, operational systems. We will discuss common pitfalls teams encounter when moving from lab to field. Attendees will leave with a practical mental model for approaching edge vision projects more effectively.

    About the Speaker

    David Moser is an AI/Computer Vision expert and Founding Engineer with a strong track record of building and deploying safety-critical visual AI systems in real-world environments. As Co-Founder of Orella Vision, he is building Visual AI for Autonomy on the Edge - going from data and models to production-grade edge deployments.

    Sanitizing Evaluation Datasets: From Detection to Correction

    We generally accept that gold standard evaluation sets contain label noise, yet we rarely fix them because the engineering friction is too high. This talk presents a workflow to operationalize ground-truth auditing. We will demonstrate how to bridge the gap between algorithmic error detection and manual rectification. Specifically, we will show how to inspect discordant ground truth labels and correct them directly in-situ. The goal is to move to a fully trusted end-to-end evaluation pipeline.

    About the Speaker

    Nick Lotz is an engineer on the Voxel51 community team. With a background in open source infrastructure and a passion for developer enablement, Nick focuses on helping teams understand their tools and how to use them to ship faster.

    Building enterprise agentic systems that scale

    Building AI agents that work in demos is easy, building true assistants that make people genuinely productive takes a different set of patterns. This talk shares lessons from a multi-agent system at Cisco used by 2,000+ sellers daily, where we moved past "chat with your data" to encoding business workflows into true agentic systems people actually rely on to get work done.

    We'll cover multi-agent orchestration patterns for complex workflows, the personalization and productivity features that drive adoption, and the enterprise foundations that helped us earn user trust at scale. You'll leave with an architecture and set of patterns that have been battle tested at enterprise scale.

    About the Speaker

    Aman Sardana is a Senior Engineering Architect at Cisco, I lead the design and deployment of enterprise AI systems that blend LLMs, data infrastructure, and customer experience to solve high‑stakes, real-world problems at scale. I’m also an open-source contributor and active mentor in the AI community, helping teams move from AI experimentation to reliable agentic applications in production.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    14 attendees from this group

Group links

Members

4,291
See all