Teil von AI, Machine Learning and Computer Vision Meetup Network - 48 Gruppen

Berlin AI Machine Learning and Computer Vision Meetup

4.7•121 Bewertungen

Über uns

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Send me a DM on Linkedin

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Kommende Veranstaltungen

Alles ansehen

Netzwerkveranstaltung
April 2 - AI, ML and Computer Vision Meetup
02. Apr. 2026 18:00 CEST
·
Online
Online
414 Teilnehmer aus 48 Gruppen
Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Apr 2, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!

Async Agents in Production: Failure Modes and Fixes

As models improve, we are starting to build long-running, asynchronous agents such as deep research agents and browser agents that can execute multi-step workflows autonomously. These systems unlock new use cases, but they fail in ways that short-lived agents do not.

The longer an agent runs, the more early mistakes compound, and the more token usage grows through extended reasoning, retries, and tool calls. Patterns that work for request-response agents often break down, leading to unreliable behaviour and unpredictable costs.

This talk is aimed at use case developers, with secondary relevance for platform engineers. It covers the most common failure modes in async agents and practical design patterns for reducing error compounding and keeping token costs bounded in production.

About the Speaker

Meryem Arik is the co-founder and CEO of Doubleword, where she works on large-scale LLM inference and production AI systems. She studied theoretical physics and philosophy at the University of Oxford. Meryem is a frequent conference speaker, including a TEDx speaker and a four-time highly rated speaker at QCon conferences. She was named to the Forbes 30 Under 30 list for her work in AI infrastructure.

Visual AI at the Edge: Beyond the Model

Edge-based visual AI promises low latency, privacy, and real-time decision-making, but many projects struggle to move beyond successful demos. This talk explores what deploying visual AI at the edge really involves, shifting the focus from models to complete, operational systems. We will discuss common pitfalls teams encounter when moving from lab to field. Attendees will leave with a practical mental model for approaching edge vision projects more effectively.

About the Speaker

David Moser is an AI/Computer Vision expert and Founding Engineer with a strong track record of building and deploying safety-critical visual AI systems in real-world environments. As Co-Founder of Orella Vision, he is building Visual AI for Autonomy on the Edge - going from data and models to production-grade edge deployments.

Sanitizing Evaluation Datasets: From Detection to Correction

We generally accept that gold standard evaluation sets contain label noise, yet we rarely fix them because the engineering friction is too high. This talk presents a workflow to operationalize ground-truth auditing. We will demonstrate how to bridge the gap between algorithmic error detection and manual rectification. Specifically, we will show how to inspect discordant ground truth labels and correct them directly in-situ. The goal is to move to a fully trusted end-to-end evaluation pipeline.

About the Speaker

Nick Lotz is an engineer on the Voxel51 community team. With a background in open source infrastructure and a passion for developer enablement, Nick focuses on helping teams understand their tools and how to use them to ship faster.

Building enterprise agentic systems that scale

Building AI agents that work in demos is easy, building true assistants that make people genuinely productive takes a different set of patterns. This talk shares lessons from a multi-agent system at Cisco used by 2,000+ sellers daily, where we moved past "chat with your data" to encoding business workflows into true agentic systems people actually rely on to get work done.

We'll cover multi-agent orchestration patterns for complex workflows, the personalization and productivity features that drive adoption, and the enterprise foundations that helped us earn user trust at scale. You'll leave with an architecture and set of patterns that have been battle tested at enterprise scale.

About the Speaker

Aman Sardana is a Senior Engineering Architect at Cisco, I lead the design and deployment of enterprise AI systems that blend LLMs, data infrastructure, and customer experience to solve high‑stakes, real-world problems at scale. I’m also an open-source contributor and active mentor in the AI community, helping teams move from AI experimentation to reliable agentic applications in production.
23 Teilnehmer aus dieser Gruppe
Netzwerkveranstaltung
April 8 - Getting Started with FiftyOne
08. Apr. 2026 19:00 CEST
·
Online
Online
54 Teilnehmer aus 48 Gruppen
This workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.

Date, Time and Location

Apr 8, 2026
10 AM PST - 11 AM Pacific
Online. Register for the Zoom!

The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.

What you'll learn
- Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
- Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
- Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
- Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
- Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
- Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.
Prerequisites:
- Working knowledge of Python and machine learning and/or computer vision fundamentals.
- All attendees will get access to the tutorials and code examples used in the workshop.
3 Teilnehmer aus dieser Gruppe
Netzwerkveranstaltung
April 9 - Workshop: Build a Visual Agent that can Navigate GUIs like Humans
09. Apr. 2026 18:00 CEST
·
Online
Online
319 Teilnehmer aus 48 Gruppen
This hands-on workshop provides a comprehensive introduction to building and evaluating visual agents for GUI automation using modern tools and techniques.

Date, Time and Location

April 9, 2026 at 9 AM Pacific
Online. Register for the Zoom

Visual agents that can understand and interact with graphical user interfaces represent a transformative frontier in AI automation. These systems combine computer vision, natural language understanding, and spatial reasoning to enable machines to navigate complex interfaces—from web applications to desktop software—just as humans do. However, building robust GUI agents requires careful attention to dataset curation, model evaluation, and iterative improvement workflows.

Participants will learn how to leverage FiftyOne, an open-source toolkit for dataset curation and computer vision workflows, to build production-ready GUI agent systems.

What You'll Learn:
- Dataset Creation & Management: How to structure, annotate, and load GUI interaction datasets using the COCO4GUI standardized format
- Data Exploration & Analysis: Using FiftyOne's interactive interface to visualize datasets, analyze action distributions, and understand annotation patterns
- Multimodal Embeddings: Computing embeddings for screenshots and UI element patches to enable similarity search and retrieval
- Model Inference: Running state-of-the-art models like Microsoft's GUI-Actor to predict interaction points from natural language instructions
- Performance Evaluation: Measuring model accuracy using standard metrics and normalized click distance to assess localization precision
- Failure Analysis: Investigating model failures through attention maps, error pattern analysis, and systematic debugging workflows
- Data-Driven Improvement: Tagging samples based on error types (attention misalignment vs. localization errors) to prioritize fine-tuning efforts
- Synthetic Data Generation: Using FiftyOne plugins to augment training data with synthetic task descriptions and variations
About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
13 Teilnehmer aus dieser Gruppe
April 24 - Berlin AI, ML and Computer Vision Meetup
24. Apr. 2026 17:30 CEST
MotionLab.Berlin, Bouchéstraße 12, Halle 20, Berlin, DE
Join our in-person meetup on April 24th to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Register to reserve your seat. Space is limited!

Date, Time and Location

Apr 24, 2026
5:30 PM - 8:30 PM

MotionLab
Bouchéstraße 12/Halle 20
12435 Berlin

Kaputt: A Large-Scale Dataset for Visual Defect Detection

We present a novel large-scale dataset for defect detection in a logistics setting. Recent work on industrial anomaly detection has primarily focused on manufacturing scenarios with highly controlled poses and a limited number of object categories. Existing benchmarks like MVTec-AD (Bergmann et al., 2021) and VisA (Zou et al., 2022) have reached saturation, with state-of-the-art methods achieving up to 99.9% AUROC scores. In contrast to manufacturing, anomaly detection in retail logistics faces new challenges, particularly in the diversity and variability of object pose and appearance. Leading anomaly detection methods fall short when applied to this new setting.

To bridge this gap, we introduce a new benchmark that overcomes the current limitations of existing datasets. With over 230,000 images (and more than 29,000 defective instances), it is 40 times larger than MVTec and contains more than 48,000 distinct objects. To validate the difficulty of the problem, we conduct an extensive evaluation of multiple state-of-the-art anomaly detection methods, demonstrating that they do not surpass 56.96% AUROC on our dataset. Further qualitative analysis confirms that existing methods struggle to leverage normal samples under heavy pose and appearance variation. With our large-scale dataset, we set a new benchmark and encourage future research towards solving this challenging problem in retail logistics anomaly detection. The dataset is available for download under https://www.kaputt-dataset.com.

About the Speaker

Sebastian Höfer is an Applied Science Manager at Amazon Fulfillment Technologies & Robotics, leading machine learning and computer vision research for large-scale robotics and warehouse automation. He received his PhD from the Robotics & Biology Lab at TU Berlin, focusing on Sim2Real transfer and robotic perception. His recent work, “Kaputt: A Large-Scale Dataset for Visual Defect Detection” (ICCV 2025) [37], established a major benchmark for industrial anomaly detection, reflecting his expertise at the intersection of academic research and real-world deployment.

Data Foundations for Vision-Language-Action Models

Model architectures get the papers, but data decides whether robots actually work. This talk introduces VLAs from a data-centric perspective: what makes robot datasets fundamentally different from image classification or video understanding, how the field is organizing its data (Open X-Embodiment, LeRobot, RLDS), and what evaluation benchmarks actually measure. We'll examine the unique challenges such as temporal structure, proprioceptive signals, and heterogeneity in embodiment, and discuss why addressing them matters more than the next architectural innovation.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.

Most AI Agents Are Broken. Let’s Fix That

AI agents are having a moment, but most of them are little more than fragile prototypes that break under pressure. Together, we’ll explore why so many agentic systems fail in practice, and how to fix that with real engineering principles. In this talk, you’ll learn how to build agents that are modular, observable, and ready for production. If you’re tired of shiny agent demos that don't deliver, this talk is your blueprint for building agents that actually work.

About the Speaker

Bilge Yücel is a Senior Developer Relations Engineer at deepset, helping developers build agentic AI apps with Haystack. Passionate about AI, she makes complex concepts approachable through hands-on tutorials, both online and at real-life events.

A Spot Pattern is Like a Fingerprint: Jaguar Identification Kaggle Challenge

The Jaguar Identification Project is a citizen science initiative engaging the public in conservation efforts in Porto Jofre, Brazil. This project increases awareness and provides an interesting and challenging dataset that requires the use of fine-grained visual classification algorithms. We discuss its ongoing Kaggle Challenge and winning strategies for it based on dataset curation and representation learning. -https://www.kaggle.com/competitions/jaguar-re-id

About the Speaker

Antonio Rueda-Toicen is a machine learning consultant at Kineto.ai and a Researcher at the Artificial Intelligence and Intelligent Systems Chair at the Hasso Plattner Institute. He organizes the Berlin Computer Vision Group and is a certified instructor at NVIDIA's Deep Learning Institute.
54 Teilnehmer