Part of AI, Machine Learning and Computer Vision Meetup Network - 52 groups

Valencia AI, Machine Learning and Computer Vision Meetup

5.0•2 ratings

About us

This group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

Upcoming events

See all

Network event
July 22 - Best of ICRA
Wed, Jul 22 · 6:00 PM CEST
·
Online
Online
122 attendees from 51 groups
The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

Date, Time and Location

Jul 22, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Contrastive learning on 3d point clouds for geometric defect detection

Reliable 3D defect detection in manufacturing is hard: the input is a point cloud — an unordered set that standard neural backbones cannot process directly; high-quality training data is scarce; and real scans are noisy and arrive in arbitrary orientations. We address these challenges in COSARAD, a contrastive learning framework that learns highly discriminative representations of object surface geometry under weak supervision.

When a test object arrives, we extract its features and compare them against a library of defect-free reference shapes for precise, interpretable defect localization — achieving state-of-the-art accuracy on industrial benchmarks such as Real3D-AD. In my talk, I'll cover the design choices behind the system, why contrastive representation learning is the right fit for sparse 3D data, and open problems in scaling inspection to production.

About the Speaker

Alexander Tarvo is a researcher at the University of Washington's MACS Lab, where he works on computer vision with applications in robotics. He holds a PhD in Software Engineering from Brown University and previously held research and engineering roles at Google, Microsoft, and IBM Research. His current research focuses on 3D vision and reinforcement learning for industrial robotics.

A Semantic and Occlusion-Aware Gaussian Mixture Probability Hypothesis Density Filter

Reliable and resilient multi-target tracking is foundational for safe autonomous driving, yet most perception pipelines frequently struggle with sensor noise, heavy clutter, and severe environmental occlusions. To resolve these limitations, this talk presents a novel Semantic-Occlusion Aware (S-OA) Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter.

By combining geometric occlusion reasoning with deep learning-derived environmental semantics, the proposed framework adaptively initializes target tracking in regions where new targets are likely to appear. Evaluations demonstrate that this context-aware tracking system minimizes track initiation latency and preserves high tracking precision even under intense clutter.

Ultimately, this work demonstrates how embedding spatial and semantic structure into filtering yields a significantly more robust and resilient perception stack for autonomous navigation.

About the Speaker

Jovan Menezes is a PhD student at Cornell University, advised by Prof. Mark Campbell. His research focuses on developing scalable and resilient perception algorithms for autonomous driving. By leveraging concepts from probabilistic estimation and deep learning-based computer vision, the goal is to enable autonomous vehicles to perceive and navigate in challenging environments.

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Autonomous robots struggle to detect objects in unstructured fields, requiring in-domain tuning with laborious manual data collection. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data.

Our method combines cross-modal annotation transfer, early sensor fusion, and a multi-stage detection architecture to train and enhance multi-modal detection. Validated on vineyard trunk detection and paired with a custom LOAM algorithm, it localised over 70% of trees in one pass with under 0.37 m mean error.

Our system demonstrated that robust detection is achievable even with minimal initial annotations and human intervention.

About the Speaker

Dimitrios Chatziparaschis is a PhD candidate in EE, in University of California, Riverside. His main research lies at the intersection of computer vision, machine learning, and robotics. Main topics include 3D perception, multi-modal sensing, landmark detection, and localization in outdoor and dynamic settings.

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

We introduce vS-Graphs, a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. The framework infers structural elements (i.e., rooms and floors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs.

This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy.

About the Speaker

Ali Tourani an R&D Specialist and a Senior Software Engineer with 8+ years of experience in practical computer vision and AI system design and deployment. Currently, he holds a Postdoctoral Research Associate position at the University of Luxembourg, where he develops vision-language models and generative AI solutions for real-world robotic applications.
1 attendee from this group
Network event
July 23 - AI, ML, and Computer Vision Meetup
Thu, Jul 23 · 6:00 PM CEST
·
Online
Online
248 attendees from 48 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Jul 23, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristicbased extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and diffusion-based video synthesis have enabled systems that not only identify key moments but also construct coherent, emotionally resonant narratives.

This survey provides a comprehensive technical review of this evolution, with a specific focus on generative techniques including autoregressive Transformers, LLM-orchestrated pipelines, and text-to-video foundation models like OpenAI's Sora and Google's Veo. We analyze the architectural progression from Graph Convolutional Networks (GCNs) to Trailer Generation Transformers (TGT), evaluate the economic implications of automated content velocity on User-Generated Content (UGC) platforms, and discuss the ethical challenges posed by high-fidelity neural synthesis.

By synthesizing insights from recent literature, this report establishes a new taxonomy for AI-driven trailer generation in the era of foundation models, suggesting that future promotional video systems will move beyond extractive selection toward controllable generative editing and semantic reconstruction of trailers.

About the Speaker

Abhishek Dharmaratnakar is an Engineering Leader at Google leading YouTube Premium. His work focuses on the intersection of hyperscale media infrastructure and generative artificial intelligence, directing cross-functional engineering organizations to redefine how billions of users consume and create content

Making Agent Systems Observable, Reliable, and Testable

In this talk, I’ll share practical lessons from building real agent systems in computer vision workflows, focusing on how to design evaluation loops, observability pipelines, and sandboxed environments that make agents reliable in practice. We’ll explore how to measure behavior end-to-end, test components independently, and build feedback loops that help agents improve over time, even as tools, models, and pipelines evolve. This talk is for engineers and builders who want to move beyond demos and learn how to make agent systems production-ready.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

Training-Free Object and Associated Effect Removal in Videos

I will be presenting our recent work, Object-WIPER, which focuses on removing objects and their associated effects from videos. Instead of fine-tuning models for each editing task, our method reuses the priors of pre-trained text-to-video models to perform object and effect removal in a training-free manner. We also curate a real world associated-effect benchmark and evaluation metric for more realistic assessment of video object removal.

About the Speaker

Saksham Singh Kushwaha is a candidate at UT Dallas, with research interests in audio-visual learning, spatial audio, and computer vision. I received my master’s degree from NYU and bachelor’s degree from IIT Delhi.

Turning Models into Systems: AI Architecture That Works

This talk explores what it really takes to make "intelligent systems" work in the messy, high-stakes reality of production environments – not just in demos or pilots. Most AI initiatives do not fail because the algorithms are weak, but because the surrounding system is not designed to handle uncertainty, change, and operational demands.

The session shows how to separate the concerns of building and improving models from their use in daily operations, and how to create a stable core of rules, safety, and business meaning around which smarter components can evolve.

Instead of treating AI as a magic add-on, the talk frames it as a capability that must be grounded in the organization's language, workflows, and responsibilities. It demonstrates how to design that core so that new models, tools, and data sources can be plugged in, compared, and replaced without breaking trust.

Attendees will leave with a clear mental model and a set of practical design ideas for turning clever prototypes into robust, understandable, and adaptable intelligent systems that people on the ground are willing to rely on.

About the Speaker

Dr. Nikita Golovko is a seasoned Solution Architect with over 16 years of experience in designing scalable, secure, and cost-effective software architectures for industrial and business-critical systems.
Network event
July 29 - MCP, Agents and Skills Meetup
Wed, Jul 29 · 6:00 PM CEST
·
Online
Online
372 attendees from 48 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across MCP, Agents and Skills.

Date, Time and Location

Jul 29, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

The Agent Control Plane: Turning Coding Agents into Reliable Engineering Workflows

AI coding agents are powerful but often unreliable — they hallucinate, lose context, and produce inconsistent results across runs. In this talk, Alex introduces Atomic, an open-source control plane that adds persistent memory, deterministic workflow phases (Research → Specify → Implement → Ship), and human-in-the-loop gates around coding agents like Claude Code and GitHub Copilot. The result: repeatable, auditable engineering workflows that teams can actually trust in production.

About the Speaker

Alex Lavaee is an Applied AI engineer at Microsoft Research and the creator of Atomic, an open-source SDK that wraps deterministic, research-to-execution workflows around AI coding agents. He previously conducted AI research at Harvard Medical School and Boston University, and has worked as an MLE and data scientist at companies including Boeing and Themis AI, an MIT CSAIL spinoff.

UISurf: Toward Universal UI Automation with Cross-Environment Agents

In this talk, we introduce UISurf, an open-source multimodal agentic UI automation platform in which agents can perceive, reason, and collaborate across browser and desktop environments to complete end-to-end tasks that require interaction with multiple user interfaces.

UISurf comprises three main components: uisurf-agent, the runtime for UI automation agents; uisurf-admin, the session orchestration and management service; and uisurf-app, the full-stack user application. Its multi-agent architecture includes a planning_agent that transforms natural-language requests into structured execution plans, specialized Browser and Desktop Agents for environment-specific interaction, an automation_agent that coordinates execution and inter-agent handoff through Agent-to-Agent (A2A) communication, and a summarization_agent that produces the final task summary for the user. UISurf supports both autonomous execution and human-in-the-loop supervision, offering a practical and extensible framework for studying and deploying cross-environment UI automation.

About the Speaker

Dr. Henry Ruiz is a Research Scientist at Texas A&M University @ AgriLife Research, specializing in Artificial Intelligence (AI) and Remote Sensing. His work focuses on the development of advanced software systems and computational algorithms for analyzing multi-source remote sensing data, including satellite imagery, UAVs (Unmanned Aerial Vehicles), LiDAR (Light Detection and Ranging), and Ground Penetrating Radar (GPR).

From Manual Workflows to AI-Assisted Skills: Building Reliable Internal Automation

In this session, I will discuss how teams can turn repetitive manual workflows into reliable AI-assisted and automation-driven “skills.” I will share practical lessons from building internal tools for CAD and engineering workflows, including how automation can reduce manual effort, improve consistency, and support better process control. The talk will also cover why many AI/agent experiments fail when they are not connected to real team workflows, standards, and validation steps. Attendees will walk away with a practical framework for identifying repeatable workflows, designing useful internal tools, and adopting AI assistance without losing accuracy or trust.

About the Speaker

Janvi Vijaykumar Saddi - Janvi Saddi is a Computer Science graduate and CAD/Data Automation professional with experience in data center design workflows, AutoCAD automation, process improvement, and data analytics. She currently works as a CAD Tech 2 at Astreya, supporting Google data center design workflows by building internal tools that reduce manual effort, improve accuracy, and streamline engineering processes. Her background also includes SQL, Power BI, market research analytics, and AI-assisted development.

Building Safe Agent Sandboxes: Let Agents Act Without Breaking Production

AI agents become truly useful when they can take action, not just generate text. But giving agents access to code, data, and systems raises an important question: how do you let them explore, execute, fail, and improve without putting production at risk?

In this talk, we'll explore the sandbox pattern for agent systems and how to equip agents with tools to read, write, execute, and iterate within controlled environments while using permissions, human approval, and safety guardrails to keep them reliable. We'll cover practical architectures and lessons learned for building agents that can safely evolve from experimentation to production..

About the Speaker

Adonai Vera - Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.
3 attendees from this group
Network event
Aug 4 - Visual AI in Manufacturing
Tue, Aug 4 · 6:00 PM CEST
·
Online
Online
95 attendees from 52 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics at the intersection of manufacturing, AI, ML, and computer vision.

Date, Time and Location

Aug 04, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Enabling Multimodal Agents on the Edge

The next generation of AI agents is moving beyond cloud-based text-only models and will interact with the physical multimodal world in real-time. For example in the vision domain, AI agents rely on Vision-Language Models (VLMs) in their backbone. However, deploying massive VLMs with billions of parameters on the edge devices remains a significant engineering hurdle.

Drawing on our recent ICML and CVPR research papers, this session explores advancements in agentic model optimizations, specifically how distillation and pruning transform 'heavyweight' models into lean, edge-ready engines. Lastly, I present our UI agent running on the actual phone that is being developed by our lab's team.

About the Speaker

Denis Gudovskiy is a Distinguished AI Engineer at Panasonic North America where he conducts R&D activities of various core AI methods, including multimodal and hardware-efficient agents, supervised and RL training pipelines, and robustness to out-of-distribution scenarios.

When the Camera Can’t Be Trusted: Health-Aware Visual AI for Reliable Near-Miss Detection

Near-miss detection systems are often evaluated as though every camera frame is equally trustworthy, even though blur, poor exposure, occlusion, contamination, and changing lighting can silently degrade the visual evidence used to make safety decisions. This talk presents an online camera-health framework that estimates visual reliability before downstream perception performance significantly deteriorates.

I will discuss how camera-health signals can support condition-aware evaluation, prioritize human review, reduce unreliable alerts, and trigger appropriate fallback behavior. Drawing from research in safety-critical visual perception, the talk will demonstrate how these principles can be adapted to industrial video systems operating across different cameras, shifts, layouts, and environmental conditions.

The presentation will also connect camera-health monitoring with rare-event discovery and failure-driven dataset improvement for more trustworthy near-miss detection.

About the Speaker

Shiva Aher is a computer vision researcher with a graduate background in computer science from the Georgia Institute of Technology, specializing in artificial intelligence.

Agentic VLM applications in manufacturing

Vision Language Models (VLMs) introduce net-new functionality to vision workloads in manufacturing that traditional computer vision models simply do not offer (e.g., open-vocabulary detection, in-context-learning). Even so, fine-tuned models like YOLO offer a level of precision and recall that today's VLMs struggle to match out-of-the-box.

Through agentic harnesses that coordinate calls to VLMs, we can start to deliver similar reliability on manufacturing-relevant tasks (e.g., many-class, many-instance detection), while also supporting the net new functionalities (e.g., multimodal search) that make VLMs distinct. In this talk, we walk through the design of these harnesses, how you serve them efficiently, and how they deliver value in manufacturing.

About the Speaker

Subraiz Ahmed is a member of the Technical Staff at Perceptron AI. He builds the infrastructure to serve frontier vision models. He previously founded a series of startups.