What we’re about
🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
📣 Past Speakers
* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr München
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nürnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Łukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT
📚 Resources
* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links
Sponsors
See allUpcoming events (4+)
See all- Network event94 attendees from 14 groups hostingNov 14 - AI, ML and Computer Vision MeetupLink visible for attendees
Register for the Zoom:
https://voxel51.com/computer-vision-events/ai-machine-learning-computer-vision-meetup-nov-14-2024/
Human-in-the-loop: Practical Lessons for Building Comprehensive AI Systems
AI systems often struggle with data limitations, data distribution shift over time, and a poor user experience. Human-in-the-loop design offers a solution by placing users at the center of AI systems and leveraging human feedback for continuous improvement.
In this talk, we'll dive deeply into a recent project at Merantix Momentum: A interactive tool for automatic rodent behaviour analysis in videos at a large scale. We'll discuss the machine learning components, including pose estimation, behavior classification, and active learning and talk about the technical challenges and the impact of the project.
About the Speaker
Adrian Loy has a Msc in IT Systems Engineering and spent the last 5 years at Merantix Momentum planning and executing Computer Vision Projects for a variety of clients. He is currently leading the Machine Learning Engineering Team at Momentum.
Deploying ML models on Edge Devices using Qualcomm AI Hub
In this talk we address the common challenges faced by developers migrating AI workloads from the cloud to edge devices. Qualcomm aims to democratize AI at the edge, easing the transition to the edge by supporting familiar frameworks and data types. This is where Qualcomm AI Hub comes in. Developers can follow along, gaining knowledge and tools to efficiently deploy optimized models on real devices using Qualcomm AI Hub.
We’ll walk through how to get started using Qualcomm AI Hub, go through examples on how to optimize models and bundle the downloadable target asset into your application and talk through iterating on your model and meet performance requirements to deploy on device!
About the Speaker
Bhushan Sonawane has optimized and deployed more than 1000s of AI models on-device on iOS and Android ecosystem. Currently, he is building AI Hub at Qualcomm to make on-device journey on Android and Snapdragon platform as seamless as possible.
Curating Excellence: Strategies for Optimizing Visual AI Datasets
In this talk Harpreet will discuss common challenges plaguing visual AI datasets, their impact on model performance, and share some tips and tricks for curating datasets to make the most of any compute budget or network architecture.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.
- Network event50 attendees from 16 groups hostingECCV Redux: Day 1 - Nov 19Link visible for attendees
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events.
Fast and Photo-realistic Novel View Synthesis from Sparse Images
Novel view synthesis generates new perspectives of a scene from a set of 2D images, enabling 3D applications like VR/AR, robotics, and autonomous driving. Current state-of-the-art methods produce high-fidelity results but require a lot of images, while sparse-view approaches often suffer from artifacts or slow inference. In this talk, I will present my research work focused on developing fast and photorealistic novel view synthesis techniques capable of handling extremely sparse input views.
ECCV 2024 Paper: CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
About the Speaker
Avinash Paliwal is a PhD Candidate in the Aggie Graphics Group at Texas A&M University. His research is focused on 3D Computer Vision and Computational Photography.
Robust Calibration of Large Vision-Language Adapters
We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline in the presence of distributional drift. We identify the increase in logit ranges as the underlying cause of miscalibration of CLIP adaptation methods, contrasting with previous work on calibrating fully-supervised models. Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits
ECCV 2024 Paper: Robust Calibration of Large Vision-Language Adapters
About the Speaker
Balamurali Murugesan is currently pursuing his Ph.D. in developing reliable deep learning models. Earlier, he completed his master’s thesis on accelerating MRI reconstruction. He has published 25+ research articles in renowned venues.
Tree-of-Life Meets AI: Knowledge-guided Generative Models for Understanding Species Evolution
A central challenge in biology is understanding how organisms evolve and adapt to their environment, acquiring variations in observable traits across the tree of life. However, measuring these traits is often subjective and labor-intensive, making trait discovery a highly label-scarce problem. With the advent of large-scale biological image repositories and advances in generative modeling, there is now an opportunity to accelerate the discovery of evolutionary traits. This talk focuses on using generative models to visualize evolutionary changes directly from images without relying on trait labels.
ECCV 2024 Paper: Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
About the Speaker
Mridul Khurana is a PhD student at Virginia Tech and a researcher with the NSF Imageomics Institute. His research focuses on AI4Science, leveraging multimodal generative modeling to drive discoveries across scientific domains.
- Network event14 attendees from 16 groups hostingECCV Redux: Day 3 - Nov 21Link visible for attendees
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events.
Closing the Gap Between Satellite and Street-View Imagery Using Generative Models
With the growing availability of satellite imagery (e.g., Google Earth), nearly every part of the world can be mapped, though street-view images remain limited. Creating street views from satellite data is crucial for applications like virtual model generation, media content enhancement, 3D gaming, and simulations. This task, known as satellite-to-ground cross-view synthesis, is tackled by our geometry-aware framework, which maintains geometric precision and relative geographical positioning using satellite information.
ECCV 2024 Paper
About the Speaker
Ningli Xu is a Ph.D. student at The Ohio State University, specializing in generative AI and computer vision, with a focus on addressing image and video generation challenges in the geospatial domain.
High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians
In just over a year, 3D Gaussian Splatting (3DGS) has made waves in computer vision for its remarkable speed, simplicity, and visual quality. Yet, even scenes of a single room can exceed a gigabyte in size, making it difficult to scale up to larger environments, like city blocks. In this talk, we’ll explore compression techniques to reduce the 3DGS memory footprint. We’ll dive deeply into our novel approach, Self-Organizing Gaussians, which proposes to map splatting attributes into a 2D grid, using a high-performance parallel linear assignment sorting developed to reorganize the splats on the fly. This grid assignment allows us to leverage traditional 2D image compression techniques like JPEG to efficiently store 3D data. Our method is quick and easy to decompress and provides a surprisingly competitive compression ratio. The drastically reduced memory requirements make this method perfect for efficiently streaming 3D scenes at large scales, which is especially useful for AR, VR and gaming applications.
ECCV 2024 Paper
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
About the Speaker
Wieland Morgenstern is a Research Associate at the Computer Vision & Graphics group at Fraunhofer HHI and is pursuing a PhD at Humboldt University Berlin. His research focuses on representing 3D scenes and virtual humans.
Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
We present Skeleton Recall Loss, a novel loss function for topologically accurate and efficient segmentation of thin, tubular structures, such as roads, nerves, or vessels. By circumventing expensive GPU-based operations, we reduce computational overheads by up to 90% compared to the current state-of-the-art, while achieving overall superior performance in segmentation accuracy and connectivity preservation. Additionally, it is the first multi-class capable loss function for thin structure segmentation.
ECCV 2024 Paper
About the Speakers
Maximilian Rokuss holds a M.Sc. in Physics from Heidelberg University, now PhD Student in Medical Image Computing at German Cancer Research Center (DKFZ) and Heidelberg University
Yannick Kirchoff holds a M.Sc. in Physics from Heidelberg University, now PhD Student in Medical Image Computing at German Cancer Research Center (DKFZ) and Helmholtz Information and Data Science School for Health
- Network event15 attendees from 16 groups hostingECCV Redux: Day 4 - Nov 22Link visible for attendees
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events.
Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning
Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability.
ECCV 2024 Paper
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
About the Speaker
Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems.
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.
ECCV 2024 Paper
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
About the Speaker
Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.
Past events (76)
See all- Network event61 attendees from 14 groups hostingNov 6 - Workshop: Developing Data-Centric AI Apps with FiftyOne PluginsThis event has passed