Part of AI, Machine Learning and Computer Vision Meetup Network - 52 groups

Mumbai AI, Machine Learning and Computer Vision Meetup

4.8•25 ratings

About us

🖖 This group is for data scientists, machine learning engineers, and open source enthusiasts.

Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Send me a DM on Linkedin

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events

See all

Network event
July 20 - Best of ICRA (Day One)
Mon, Jul 20 · 9:30 PM IST
·
Online
Online
88 attendees from 52 groups
The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

Register for the Zoom to get access to all three days of the Best of ICRA.

Each session features a curated lineup of speakers sharing cutting-edge research across robotics, computer vision, and AI — straight from papers accepted at one of the field's top conferences.

Towards Versatile Opti-Acoustic Sensor Fusion and Volumetric Mapping for Safe Underwater Navigation

Accurate sensing and mapping are critical for autonomous underwater vehicles operating in obstacle-rich environments. While vision provides high-resolution data, it fails in turbid water, and whereas sonar is robust to turbidity, it suffers from low resolution and elevation ambiguity.

To overcome these limitations, our recent work introduces an opti-acoustic sensor fusion framework that pairs a monocular camera with a stereo sonar to resolve elevation ambiguity and produce fully defined 3D point clouds. These multi-modal points are then fused using a confidence-weighted Gaussian Process Volumetric Mapping framework that prioritizes high-confidence, safety-critical data.

Ultimately, field trials and experimental results validate that this framework successfully captures complex geometries to ensure reliable navigation under degraded sensing conditions.

About the Speaker

Ivana Collado Gonzalez is a Ph.D. candidate at Stevens Institute of Technology, holding an M.S. in Robotics from Stevens and a B.S. in Mechatronics Engineering from Tecnológico de Monterrey, Mexico. Her industry experience includes developing autonomous mobile robots at Xlab Protexa R&D. Ivana’s research focuses on mobile robot exploration, localization, and mapping, specifically advancing marine robotics and perception within complex underwater environments.

Teaching Drones to See What Matters with Reinforcement Learning

Autonomous inspection of industrial environments requires robots to identify and prioritize specific objects of interest, rather than exhaustively exploring their surroundings. This talk presents a deep reinforcement learning framework which enables aerial robots to simultaneously locate, visually inspect semantic targets, and navigate collision-free in unknown environments using only onboard sensors.

The policy generalizes from training on primitive shapes to inspecting complex, real-world structures in real-world settings. The talk also covers a second line of work on active perception, where the the flying agent learns to actively steer its camera sensor during navigation to maximize situational awareness.

Together, these approaches push toward truly autonomous robots that understand not just where to go, but what to look at.

About the Speaker

Grzegorz Malczyk is a PhD candidate at the Autonomous Robots Lab, Norwegian University of Science and Technology (NTNU), where he researches reinforcement learning for autonomous robotic navigation and inspection of industrial environments. He holds an MSc in Robotics, Systems and Control from ETH Zurich and has published across IEEE RA-L, ICRA, and IROS. His work spans semantics-aware path planning, active perception, and sim-to-real transfer for aerial robots.

Gameplay With a Socially Supportive Virtual Robot Enhances Children’s Global Self-Esteem, Peer Relationships, Interest and Engagement

This work investigates whether a socially supportive virtual robot can enhance children's self-esteem and social engagement through game-based interactions. We conducted a month-long study with 23 children in India, where participants played a video game with or without a virtual robot that provided positive reinforcement.
Our results showed that children interacting with the robot demonstrated significant improvements in global self-esteem, friendship quality and quantity, and sustained motivation and enjoyment. These findings highlight the potential of socially supportive virtual robots as tools for promoting children's psychological well-being and social development.

About the Speaker

Devasena Pasupuleti is a researcher in Human-Robot Interaction at the University of Osaka in Japan. Her research focuses on social robotics, conversational AI, and game-based technologies to support and assess children's well-being, learning, and social development. She has authored over 20 publications in leading IEEE and ACM conferences and journals, received multiple Best Presenter and Best Research awards, and is a frequent guest speaker at international research events. Outside her research, she is an author of children's books, reflecting her passion for making robotics accessible and engaging for young audiences.

Safe and Stable Neural Dynamical Systems for Robust Motion Planning

Learning safe and stable robot motions from demonstrations remains a challenge, especially in complex, nonlinear tasks involving dynamic, obstacle-rich environments. In this talk, I will present neural dynamical systems that can help in achieving robust motion plans directly from robot demonstrations.

Moreover, in environments with static algorithms, the framework will yield safe motions with certified confidence bounds.

About the Speaker

Mahathi Anand is a postdoctoral researcher at the Learning Systems and Robotics Lab, Technical University of Germany. She had previously obtained her PhD (Dr. rer. nat.) from LMU Munich, Germany, in 2023. Prior to that, she graduated with a Bachelors in Technology in Electrical and Electronics Engineering from SRM Institute of Science and Technology, India in 2016 and completed her Masters of Technology in Electrical Engineering with specialization in System and Control from Indian Institute of Technology Roorkee, India in 2019.
5 attendees from this group
Network event
July 21 - Best of ICRA
Tue, Jul 21 · 9:30 PM IST
·
Online
Online
133 attendees from 51 groups
The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

Date, Time and Location

Jul 21, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Outdoor Robot Navigation in the Unstructured World: From Traversability to Physical Scene Understanding

Outdoor robot navigation in the unstructured world requires robots to reason about more than obstacles: they must understand where they can move, what terrain is suitable, and how scene context affects navigation decisions. In sidewalks, campuses, trails, and off-road environments, these decisions depend on geometric structure, terrain conditions, semantic cues, and robot-environment interaction.

In this talk, I will present our recent work on scene understanding for outdoor navigation, including a large-scale multimodal dataset for studying outdoor traversability, approaches for trajectory generation and selection, vision-language reasoning for contextual navigation, and Gaussian-based 3D scene modeling. I will also discuss how physical reasoning can extend scene understanding from visual and geometric perception toward terrain properties and interaction cues.

Together, these works explore how robots can better interpret unstructured outdoor environments and use that understanding for navigation decision-making.

About the Speaker

Jing Liang is a postdoctoral researcher at the Stanford Robotics Center, working on robot navigation, perception, and human-centered autonomy in complex real-world environments.

Scene Graphs and the Future of Mapping

In this talk, I will question whether 3D reconstruction is still a necessary part of mapping in the age of feedforward models and present some alternatives. Then, I discuss scene graphs as an alternative map representation and their applications for mobile manipulation.

About the Speaker

Hermann Blum is a Junior Professor at the University of Bonn and the Lamarr Institute. Hermann's research focuses on machine learning for robotic perception and scene understanding, developing models and methods to understand an agent's environment semantically and geometrically.

Toward Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices

Robust 6D pose estimation of textured objects under diverse illumination conditions remains a significant challenge, often requiring a trade-off between accurate initial pose estimation and efficient real-time tracking. We present a unified framework explicitly designed for efficient execution on edge devices, which fuses a robust initial estimation module with a fast motion-based tracker.

The key to our approach is a shared, lighting-invariant color-pair feature representation that forms a consistent foundation for both stages. For initial estimation, this representation facilitates robust registration between the live RGB-D view and the object's 3D mesh.

For tracking, the same representation validates temporal correspondences, enabling a lightweight model to reliably regress the object's pose. Experiments on benchmark datasets demonstrate that our integrated approach is both effective and robust, providing competitive pose estimation accuracy while maintaining high-fidelity tracking even through abrupt pose changes.

This is joint work with Xingjian Yang.

About the Speaker

Ashis Banerjee is an Associate Professor of Industrial & Systems Engineering and Mechanical Engineering at the University of Washington, Seattle. Prior to joining UW, he was a Research Scientist at GE Global Research and a Postdoctoral Associate at MIT.

Trustworthy Geometric Perception: Certifiable Optimization and Robust Estimation

Autonomous robots in safety-critical settings require geometric perception that is not merely accurate on average, but provably correct under adversarial conditions. Yet most pipelines rely on local optimization methods that fail silently when poorly initialized.

This talk presents GlobustVP, a certifiably optimal vanishing point estimator that reformulates joint VP localization and line association as a quadratically constrained quadratic program (QCQP) and relaxes it to a tight semidefinite program (SDP), achieving the first globally optimal and outlier-robust solution to this problem. Recognized as a Best Paper Award Candidate at CVPR 2025 (top 0.1%, 14 of 13,008 submissions), GlobustVP demonstrates that certifiable global optimization is both practically feasible and highly competitive.

More broadly, this work is part of a research program toward trustworthy geometric perception: systems that know when they are wrong, and can communicate that to the robots and humans that depend on them.

About the Speaker

Zhenjun Zhao I am a postdoctoral researcher at University of Zaragoza, working with Javier Civera.
1 attendee from this group
Network event
July 22 - Best of ICRA
Wed, Jul 22 · 9:30 PM IST
·
Online
Online
109 attendees from 51 groups
The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

Date, Time and Location

Jul 22, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Contrastive learning on 3d point clouds for geometric defect detection

Reliable 3D defect detection in manufacturing is hard: the input is a point cloud — an unordered set that standard neural backbones cannot process directly; high-quality training data is scarce; and real scans are noisy and arrive in arbitrary orientations. We address these challenges in COSARAD, a contrastive learning framework that learns highly discriminative representations of object surface geometry under weak supervision.

When a test object arrives, we extract its features and compare them against a library of defect-free reference shapes for precise, interpretable defect localization — achieving state-of-the-art accuracy on industrial benchmarks such as Real3D-AD. In my talk, I'll cover the design choices behind the system, why contrastive representation learning is the right fit for sparse 3D data, and open problems in scaling inspection to production.

About the Speaker

Alexander Tarvo is a researcher at the University of Washington's MACS Lab, where he works on computer vision with applications in robotics. He holds a PhD in Software Engineering from Brown University and previously held research and engineering roles at Google, Microsoft, and IBM Research. His current research focuses on 3D vision and reinforcement learning for industrial robotics.

A Semantic and Occlusion-Aware Gaussian Mixture Probability Hypothesis Density Filter

Reliable and resilient multi-target tracking is foundational for safe autonomous driving, yet most perception pipelines frequently struggle with sensor noise, heavy clutter, and severe environmental occlusions. To resolve these limitations, this talk presents a novel Semantic-Occlusion Aware (S-OA) Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter.

By combining geometric occlusion reasoning with deep learning-derived environmental semantics, the proposed framework adaptively initializes target tracking in regions where new targets are likely to appear. Evaluations demonstrate that this context-aware tracking system minimizes track initiation latency and preserves high tracking precision even under intense clutter.

Ultimately, this work demonstrates how embedding spatial and semantic structure into filtering yields a significantly more robust and resilient perception stack for autonomous navigation.

About the Speaker

Jovan Menezes is a PhD student at Cornell University, advised by Prof. Mark Campbell. His research focuses on developing scalable and resilient perception algorithms for autonomous driving. By leveraging concepts from probabilistic estimation and deep learning-based computer vision, the goal is to enable autonomous vehicles to perceive and navigate in challenging environments.

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Autonomous robots struggle to detect objects in unstructured fields, requiring in-domain tuning with laborious manual data collection. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data.

Our method combines cross-modal annotation transfer, early sensor fusion, and a multi-stage detection architecture to train and enhance multi-modal detection. Validated on vineyard trunk detection and paired with a custom LOAM algorithm, it localised over 70% of trees in one pass with under 0.37 m mean error.

Our system demonstrated that robust detection is achievable even with minimal initial annotations and human intervention.

About the Speaker

Dimitrios Chatziparaschis is a PhD candidate in EE, in University of California, Riverside. His main research lies at the intersection of computer vision, machine learning, and robotics. Main topics include 3D perception, multi-modal sensing, landmark detection, and localization in outdoor and dynamic settings.

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

We introduce vS-Graphs, a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. The framework infers structural elements (i.e., rooms and floors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs.

This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy.

About the Speaker

Ali Tourani an R&D Specialist and a Senior Software Engineer with 8+ years of experience in practical computer vision and AI system design and deployment. Currently, he holds a Postdoctoral Research Associate position at the University of Luxembourg, where he develops vision-language models and generative AI solutions for real-world robotic applications.
2 attendees from this group
Network event
July 23 - AI, ML, and Computer Vision Meetup
Thu, Jul 23 · 9:30 PM IST
·
Online
Online
221 attendees from 48 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Jul 23, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristicbased extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and diffusion-based video synthesis have enabled systems that not only identify key moments but also construct coherent, emotionally resonant narratives.

This survey provides a comprehensive technical review of this evolution, with a specific focus on generative techniques including autoregressive Transformers, LLM-orchestrated pipelines, and text-to-video foundation models like OpenAI's Sora and Google's Veo. We analyze the architectural progression from Graph Convolutional Networks (GCNs) to Trailer Generation Transformers (TGT), evaluate the economic implications of automated content velocity on User-Generated Content (UGC) platforms, and discuss the ethical challenges posed by high-fidelity neural synthesis.

By synthesizing insights from recent literature, this report establishes a new taxonomy for AI-driven trailer generation in the era of foundation models, suggesting that future promotional video systems will move beyond extractive selection toward controllable generative editing and semantic reconstruction of trailers.

About the Speaker

Abhishek Dharmaratnakar is an Engineering Leader at Google leading YouTube Premium. His work focuses on the intersection of hyperscale media infrastructure and generative artificial intelligence, directing cross-functional engineering organizations to redefine how billions of users consume and create content

Making Agent Systems Observable, Reliable, and Testable

In this talk, I’ll share practical lessons from building real agent systems in computer vision workflows, focusing on how to design evaluation loops, observability pipelines, and sandboxed environments that make agents reliable in practice. We’ll explore how to measure behavior end-to-end, test components independently, and build feedback loops that help agents improve over time, even as tools, models, and pipelines evolve. This talk is for engineers and builders who want to move beyond demos and learn how to make agent systems production-ready.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

Training-Free Object and Associated Effect Removal in Videos

I will be presenting our recent work, Object-WIPER, which focuses on removing objects and their associated effects from videos. Instead of fine-tuning models for each editing task, our method reuses the priors of pre-trained text-to-video models to perform object and effect removal in a training-free manner. We also curate a real world associated-effect benchmark and evaluation metric for more realistic assessment of video object removal.

About the Speaker

Saksham Singh Kushwaha is a candidate at UT Dallas, with research interests in audio-visual learning, spatial audio, and computer vision. I received my master’s degree from NYU and bachelor’s degree from IIT Delhi.

Turning Models into Systems: AI Architecture That Works

This talk explores what it really takes to make "intelligent systems" work in the messy, high-stakes reality of production environments – not just in demos or pilots. Most AI initiatives do not fail because the algorithms are weak, but because the surrounding system is not designed to handle uncertainty, change, and operational demands.

The session shows how to separate the concerns of building and improving models from their use in daily operations, and how to create a stable core of rules, safety, and business meaning around which smarter components can evolve.

Instead of treating AI as a magic add-on, the talk frames it as a capability that must be grounded in the organization's language, workflows, and responsibilities. It demonstrates how to design that core so that new models, tools, and data sources can be plugged in, compared, and replaced without breaking trust.

Attendees will leave with a clear mental model and a set of practical design ideas for turning clever prototypes into robust, understandable, and adaptable intelligent systems that people on the ground are willing to rely on.

About the Speaker

Dr. Nikita Golovko is a seasoned Solution Architect with over 16 years of experience in designing scalable, secure, and cost-effective software architectures for industrial and business-critical systems.
12 attendees from this group