Part of Computer Vision Meetups - 16 groups

Bangalore Computer Vision Meetup Group

4.5•60 ratings

About us

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of computer vision.

Are you interested in speaking at a future Meetup?
Is your company interested in sponsoring a Meetup?

Contact the Meetup organizers!

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone

📣 Past Speakers

* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr München
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nürnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Łukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT

📚 Resources

* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links

Upcoming events

See all

Network event
July 23 - AI, ML, and Computer Vision Meetup
Thu, Jul 23 · 9:30 PM IST
·
Online
Online
26 attendees from 16 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Jul 23, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristicbased extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and diffusion-based video synthesis have enabled systems that not only identify key moments but also construct coherent, emotionally resonant narratives.

This survey provides a comprehensive technical review of this evolution, with a specific focus on generative techniques including autoregressive Transformers, LLM-orchestrated pipelines, and text-to-video foundation models like OpenAI's Sora and Google's Veo. We analyze the architectural progression from Graph Convolutional Networks (GCNs) to Trailer Generation Transformers (TGT), evaluate the economic implications of automated content velocity on User-Generated Content (UGC) platforms, and discuss the ethical challenges posed by high-fidelity neural synthesis.

By synthesizing insights from recent literature, this report establishes a new taxonomy for AI-driven trailer generation in the era of foundation models, suggesting that future promotional video systems will move beyond extractive selection toward controllable generative editing and semantic reconstruction of trailers.

About the Speaker

Abhishek Dharmaratnakar is an Engineering Leader at Google leading YouTube Premium. His work focuses on the intersection of hyperscale media infrastructure and generative artificial intelligence, directing cross-functional engineering organizations to redefine how billions of users consume and create content

Making Agent Systems Observable, Reliable, and Testable

In this talk, I’ll share practical lessons from building real agent systems in computer vision workflows, focusing on how to design evaluation loops, observability pipelines, and sandboxed environments that make agents reliable in practice. We’ll explore how to measure behavior end-to-end, test components independently, and build feedback loops that help agents improve over time, even as tools, models, and pipelines evolve. This talk is for engineers and builders who want to move beyond demos and learn how to make agent systems production-ready.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

Training-Free Object and Associated Effect Removal in Videos

I will be presenting our recent work, Object-WIPER, which focuses on removing objects and their associated effects from videos. Instead of fine-tuning models for each editing task, our method reuses the priors of pre-trained text-to-video models to perform object and effect removal in a training-free manner. We also curate a real world associated-effect benchmark and evaluation metric for more realistic assessment of video object removal.

About the Speaker

Saksham Singh Kushwaha is a candidate at UT Dallas, with research interests in audio-visual learning, spatial audio, and computer vision. I received my master’s degree from NYU and bachelor’s degree from IIT Delhi.

Turning Models into Systems: AI Architecture That Works

This talk explores what it really takes to make "intelligent systems" work in the messy, high-stakes reality of production environments – not just in demos or pilots. Most AI initiatives do not fail because the algorithms are weak, but because the surrounding system is not designed to handle uncertainty, change, and operational demands.

The session shows how to separate the concerns of building and improving models from their use in daily operations, and how to create a stable core of rules, safety, and business meaning around which smarter components can evolve.

Instead of treating AI as a magic add-on, the talk frames it as a capability that must be grounded in the organization's language, workflows, and responsibilities. It demonstrates how to design that core so that new models, tools, and data sources can be plugged in, compared, and replaced without breaking trust.

Attendees will leave with a clear mental model and a set of practical design ideas for turning clever prototypes into robust, understandable, and adaptable intelligent systems that people on the ground are willing to rely on.

About the Speaker

Dr. Nikita Golovko is a seasoned Solution Architect with over 16 years of experience in designing scalable, secure, and cost-effective software architectures for industrial and business-critical systems.
2 attendees from this group
Network event
July 29 - MCP, Agents and Skills Meetup
Wed, Jul 29 · 9:30 PM IST
·
Online
Online
43 attendees from 16 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across MCP, Agents and Skills.

Date, Time and Location

Jul 29, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

The Agent Control Plane: Turning Coding Agents into Reliable Engineering Workflows

AI coding agents are powerful but often unreliable — they hallucinate, lose context, and produce inconsistent results across runs. In this talk, Alex introduces Atomic, an open-source control plane that adds persistent memory, deterministic workflow phases (Research → Specify → Implement → Ship), and human-in-the-loop gates around coding agents like Claude Code and GitHub Copilot. The result: repeatable, auditable engineering workflows that teams can actually trust in production.

About the Speaker

Alex Lavaee is an Applied AI engineer at Microsoft Research and the creator of Atomic, an open-source SDK that wraps deterministic, research-to-execution workflows around AI coding agents. He previously conducted AI research at Harvard Medical School and Boston University, and has worked as an MLE and data scientist at companies including Boeing and Themis AI, an MIT CSAIL spinoff.

UISurf: Toward Universal UI Automation with Cross-Environment Agents

In this talk, we introduce UISurf, an open-source multimodal agentic UI automation platform in which agents can perceive, reason, and collaborate across browser and desktop environments to complete end-to-end tasks that require interaction with multiple user interfaces.

UISurf comprises three main components: uisurf-agent, the runtime for UI automation agents; uisurf-admin, the session orchestration and management service; and uisurf-app, the full-stack user application. Its multi-agent architecture includes a planning_agent that transforms natural-language requests into structured execution plans, specialized Browser and Desktop Agents for environment-specific interaction, an automation_agent that coordinates execution and inter-agent handoff through Agent-to-Agent (A2A) communication, and a summarization_agent that produces the final task summary for the user. UISurf supports both autonomous execution and human-in-the-loop supervision, offering a practical and extensible framework for studying and deploying cross-environment UI automation.

About the Speaker

Dr. Henry Ruiz is a Research Scientist at Texas A&M University @ AgriLife Research, specializing in Artificial Intelligence (AI) and Remote Sensing. His work focuses on the development of advanced software systems and computational algorithms for analyzing multi-source remote sensing data, including satellite imagery, UAVs (Unmanned Aerial Vehicles), LiDAR (Light Detection and Ranging), and Ground Penetrating Radar (GPR).

From Manual Workflows to AI-Assisted Skills: Building Reliable Internal Automation

In this session, I will discuss how teams can turn repetitive manual workflows into reliable AI-assisted and automation-driven “skills.” I will share practical lessons from building internal tools for CAD and engineering workflows, including how automation can reduce manual effort, improve consistency, and support better process control. The talk will also cover why many AI/agent experiments fail when they are not connected to real team workflows, standards, and validation steps. Attendees will walk away with a practical framework for identifying repeatable workflows, designing useful internal tools, and adopting AI assistance without losing accuracy or trust.

About the Speaker

Janvi Vijaykumar Saddi - Janvi Saddi is a Computer Science graduate and CAD/Data Automation professional with experience in data center design workflows, AutoCAD automation, process improvement, and data analytics. She currently works as a CAD Tech 2 at Astreya, supporting Google data center design workflows by building internal tools that reduce manual effort, improve accuracy, and streamline engineering processes. Her background also includes SQL, Power BI, market research analytics, and AI-assisted development.

Building Safe Agent Sandboxes: Let Agents Act Without Breaking Production

AI agents become truly useful when they can take action, not just generate text. But giving agents access to code, data, and systems raises an important question: how do you let them explore, execute, fail, and improve without putting production at risk?

In this talk, we'll explore the sandbox pattern for agent systems and how to equip agents with tools to read, write, execute, and iterate within controlled environments while using permissions, human approval, and safety guardrails to keep them reliable. We'll cover practical architectures and lessons learned for building agents that can safely evolve from experimentation to production..

About the Speaker

Adonai Vera - Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.
3 attendees from this group
Network event
Aug 4 - Visual AI in Manufacturing
Tue, Aug 4 · 9:30 PM IST
·
Online
Online
17 attendees from 16 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics at the intersection of manufacturing, AI, ML, and computer vision.

Date, Time and Location

Aug 04, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Enabling Multimodal Agents on the Edge

The next generation of AI agents is moving beyond cloud-based text-only models and will interact with the physical multimodal world in real-time. For example in the vision domain, AI agents rely on Vision-Language Models (VLMs) in their backbone. However, deploying massive VLMs with billions of parameters on the edge devices remains a significant engineering hurdle.

Drawing on our recent ICML and CVPR research papers, this session explores advancements in agentic model optimizations, specifically how distillation and pruning transform 'heavyweight' models into lean, edge-ready engines. Lastly, I present our UI agent running on the actual phone that is being developed by our lab's team.

About the Speaker

Denis Gudovskiy is a Distinguished AI Engineer at Panasonic North America where he conducts R&D activities of various core AI methods, including multimodal and hardware-efficient agents, supervised and RL training pipelines, and robustness to out-of-distribution scenarios.

When the Camera Can’t Be Trusted: Health-Aware Visual AI for Reliable Near-Miss Detection

Near-miss detection systems are often evaluated as though every camera frame is equally trustworthy, even though blur, poor exposure, occlusion, contamination, and changing lighting can silently degrade the visual evidence used to make safety decisions. This talk presents an online camera-health framework that estimates visual reliability before downstream perception performance significantly deteriorates.

I will discuss how camera-health signals can support condition-aware evaluation, prioritize human review, reduce unreliable alerts, and trigger appropriate fallback behavior. Drawing from research in safety-critical visual perception, the talk will demonstrate how these principles can be adapted to industrial video systems operating across different cameras, shifts, layouts, and environmental conditions.

The presentation will also connect camera-health monitoring with rare-event discovery and failure-driven dataset improvement for more trustworthy near-miss detection.

About the Speaker

Shiva Aher is a computer vision researcher with a graduate background in computer science from the Georgia Institute of Technology, specializing in artificial intelligence.

Agentic VLM applications in manufacturing

Vision Language Models (VLMs) introduce net-new functionality to vision workloads in manufacturing that traditional computer vision models simply do not offer (e.g., open-vocabulary detection, in-context-learning). Even so, fine-tuned models like YOLO offer a level of precision and recall that today's VLMs struggle to match out-of-the-box.

Through agentic harnesses that coordinate calls to VLMs, we can start to deliver similar reliability on manufacturing-relevant tasks (e.g., many-class, many-instance detection), while also supporting the net new functionalities (e.g., multimodal search) that make VLMs distinct. In this talk, we walk through the design of these harnesses, how you serve them efficiently, and how they deliver value in manufacturing.

About the Speaker

Subraiz Ahmed is a member of the Technical Staff at Perceptron AI. He builds the infrastructure to serve frontier vision models. He previously founded a series of startups.
3 attendees from this group
Network event
Aug 6 - Audio and AI Meetup
Thu, Aug 6 · 9:30 PM IST
·
Online
Online
11 attendees from 16 groups
Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Date, Time and Location

Aug 06, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Do Speech Models Actually Understand Speech? Evaluating Speech LLMs Under Realistic Spoken Instruction Conditions

Speech Large Language Models (SLLMs) are increasingly capable; but are we evaluating them the right way? Most benchmarks rely on text prompts, yet real users interact with these systems through speech, a modality that introduces noise, disfluencies, and stylistic variation that text simply doesn't capture.
In this talk, we present findings from a systematic study across 11 tasks, 12 languages, and five prompt styles, examining how prompt modality, language, and task type shape SLLM performance.

About the Speaker

Maike Züfle is a PhD student at the Karlsruhe Institute of Technology (KIT), working in Prof. Jan Niehues's group on interactive speech systems for more natural human–machine communication. Her research focuses on instruction-following speech models with speech as both input and output, with a recent emphasis on full-duplex systems. Beyond her research, she co-organises the instruction-following and speech translation metrics shared tasks at IWSLT. She is a 2026 Apple Scholar in AI/ML.

AI based Audio Forensics

In this presentation, attendees will discover several modules developed by Gradiant for the detection and analysis of synthetically generated or manipulated audio. The session will be delivered by one of the developers involved in the design and implementation of these technologies, providing first-hand insight into their capabilities and underlying methodology.

The presentation will cover the traceability module, which helps identify the origin of AI-generated content. It will also cover the segment detection tool, designed to locate manipulated regions within an audio recording, as well as the complete audio detection tool, which assesses whether an entire recording has been synthetically generated.

About the Speaker

Daniel Paniagua Ares is a research engineer at Gradiant. Graduated in computer engineering from the FIC and with a master's degree in AI from the VIU.

Curating, Searching, and Evaluating Audio Datasets in FiftyOne

In this talk, we'll start with the ESC-50 environmental-sound dataset to show how FiftyOne represents audio: browsing clips in the tabular view, rendering spectrograms directly in the sample grid with a custom renderer, and turning sounds into searchable vectors with CLAP embeddings. Then we'll demo a similarity-search panel that lets you query an entire audio collection by example clip or a natural-language prompt to quickly find matching sounds.

We'll conclude with a live research problem: Audio Moment Retrieval from the DCASE 2026 Challenge, where the goal is to localize the exact moment in a long recording that matches a text query. We'll frame this as temporal detection, evaluate predictions, and visualize ground-truth vs. predicted moments on an interactive timeline to intuitively expose model failure modes.

Attendees will leave with a concrete blueprint and open code for applying visual data-centric AI practices to their own audio and multimodal datasets.

About the Speaker

John Duncan is a Machine Learning Engineer, Customer Success at Voxel51. His research interests include vision, LiDAR, and audio perception for robots and intelligent systems.
3 attendees from this group