What we’re about
🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of AI, ML and computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of AI.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: [https://github.com/voxel51/fiftyone](https://github.com/voxel51/fiftyone)
Upcoming events (4)
See all- Visual AI for Geospatial DataLink visible for attendees
Date and Time
Jan 29, 2025 at 9 AM Pacific / Noon Eastern
Is AI Creating a Whole New Earth-Aware Geospatial Stack? Promises and Challenges
The latest wave of AI innovation is profoundly changing many domains. In remote sensing, despite efforts like ours at Clay and others, it is been less so. In this talk we will share our experience as we realize, and explore, if geoAI represents a whole new stack to work with Earth data.
About the Speaker
Dr. Bruno Sanchez-Andrade Nuno is the executive director of the non-profit project Clay, an AI model for remote sensing. Previously, Bruno has had more than a decace of operational geosptatial system like director of the Planetary Computer at Microsoft, Big Data innovations at the World Bank, and Chief Scientist at Mapbox.
Evaluating the Satlas and Clay Remote Sensing Foundational Models
Geospatial and Earth Observation have benefited from the new advances in computer vision. In this talk we are going to evaluate the accuracy and ease of use for two of these great new models – the Satlas and Clay foundational models. The evaluation will look at distinct different areas on the globe. Come see how this gift of foundational models improves your work in geospatial or Earth observation analysis.
About the Speaker
Steve Pousty is a dad, partner, son, a founder, and a principal developer advocate at Voxel51. He can teach you about Computer Vision, Data Analysis, Java, Python, PostgreSQL, Microservices, and Kubernetes. He has deep expertise in GIS/Spatial, Remote Sensing, Statistics, and Ecology. Steve has a Ph.D. in Ecology and can be bribed with offers of bird watching or fly fishing.
Earth Monitoring for Everyone with Earth Index
Earth Index is a end user focused application that preprocesses global imagery through AI foundation models to enable rapid in-browser search and monitoring. Earth Genome builds Earth Index for critical applications in the environment, and is being used today to report on illegal airstrips built in the Peruvian Amazon, track cattle factory farms across the planet for emissions modeling, and expose illegal gold mining in the Yanomami Indigenous Territory
About the Speaker
Mikel Maron works on open technology for the earth. He leads product development and sets organizational pace at Earth Genome. Previously, Mikel led corporate social responsibility at Mapbox, elevated open mapping in the federal government as a Presidential Innovation Fellow, and founded community mapping initiatives notably Map Kibera through Ground Truth Initiative. He has a long association with the OpenStreetMap project, founding Humanitarian OpenStreetMap Team in 2005 and serving many years on the OSM Foundation Board.
- Network event112 attendees from 25 groups hostingJan 30 - AI, Machine Learning and Computer Vision MeetupLink visible for attendees
Date and Time
Jan 30, 2025 at 10 AM PacificSwimming Upstream: Using Machine Vision to Create Sustainable Practices in Fisheries of the Future
Fishing vessels are on track to generate 10 million hours of video footage annually, creating a massive machine learning operations challenge. At AI.Fish, we are building an end-to-end system enabling non-technical users to harness AI for catch monitoring and classification both on-board and in the cloud. This talk explores our journey in building these approachable systems and working toward answering an old question: How many fish are in the ocean?
About the Speaker
Orvis Evans is a Software Engineer at AI.Fish, where he co-architects ML-Ops pipelines and develops intuitive interfaces that make machine vision accessible to non-technical users. Drawing from his background in building interactive systems, he builds front-end applications and APIs that enable fisheries to process thousands of hours of footage without machine learning expertise.
Scaling Semantic Segmentation with Blender
Generating datasets for semantic segmentation can be time-intensive. Learn how to use Blender’s Python API to create diverse and realistic synthetic data with automated labels, saving time and improving model performance. Preview the topics to be discussed in this Medium post.
About the Speaker
Vincent Vandenbussche has a PhD in Physics, is an author, and Machine Learning Engineer with 10 years of experience in software engineering and machine learning.WACV 2025 - Elderly Action Recognition Challenge
Join us for a quick update on the Elderly Action Recognition (EAR) Challenge, part of the Computer Vision for Smalls (CV4Smalls) Workshop at the WACV 2025 conference!
This challenge focuses on advancing research in Activity of Daily Living (ADL) recognition, particularly within the elderly population, a domain with profound societal implications. Participants will employ transfer learning techniques with any architecture or model they want to use. For example, starting with a general human action recognition benchmark and fine-tuning models on a subset of data tailored to elderly-specific activities.
Sign up for the EAR challenge and learn more.
About the Speaker
Paula Ramos, PhD is a Senior DevRel and Applied AI Research Advocate at Voxel51.
Transforming Programming Ed: An AI-Powered Teaching Assistant for Scalable and Adaptive Learning
The future of education lies in personalized and scalable solutions, especially in fields like computer engineering where complex concepts often challenge students. This talk introduces Lumina (AI Teaching Assistant), a cutting-edge agentic system designed to revolutionize programming education through its innovative architecture and teaching strategies. Built using OpenAI API, LangChain, RAG, and ChromaDB, Lumina employs an agentic, multi-modal framework that dynamically integrates course materials, technical documentation, and pedagogical strategies into an adaptive knowledge-driven system. Its unique “Knowledge Components” approach decomposes programming concepts into interconnected teachable units, enabling proficiency-based learning and dynamic problem-solving guidance. Attendees will discover how Lumina’s agentic architecture enhances engagement, fosters critical thinking, and improves concept mastery, paving the way for scalable AI-driven educational solutions.
About the Speaker
Nittin Murthi Dhekshinamoorthy is a computer engineering student and researcher at the University of Illinois Urbana-Champaign with a strong focus on advancing artificial intelligence to solve real-world challenges in education and technology. He is the creator of an AI agent-based Teaching Assistant, leveraging cutting-edge frameworks to provide scalable, adaptive learning solutions, and has contributed to diverse, impactful projects, including natural language-to-SQL systems and deep learning models for clinical image segmentation.
- Network event55 attendees from 25 groups hostingBest of NeurIPS - Feb 4Link visible for attendees
Date and Time
Feb 4, 2025 at 9 AM PacificWelcome to the Best of NeurIPS virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
No "Zero-Shot" Without Exponential Data
Web-crawled pretraining datasets underlie the impressive “zero-shot” evaluation performance of multimodal models. However, it is unclear how meaningful the notion of “zero-shot” generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during “zero-shot” evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets?
Through thorough experiments, we consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. Taken together, our study reveals an exponential need for training data which implies that the key to “zero-shot” generalization capabilities under large-scale training paradigms remains to be found.
Read the paper, “No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance”
About the Speaker
Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.
Understanding Bias in Large-Scale Visual Datasets
Truly general-purpose vision systems require pre-training on diverse and representative visual datasets. The “dataset classification” experiment reveals that modern large-scale visual datasets are still very biased: neural networks can achieve excellent accuracy in classifying which dataset an image is from. However, the concrete forms of bias among these datasets remain unclear. In this talk, I will present a framework to identify the unique visual attributes distinguishing these large-scale datasets.
Read the paper, “Understanding Bias in Large-Scale Visual Datasets”
About the Speaker
Boya Zeng is an undergraduate student at the University of Pennsylvania. He is currently working with Prof. Zhuang Liu at Princeton University on visual datasets and generative models.
Map It Anywhere: Empowering BEV Map Prediction using Large-scale Public Datasets
Top-down Bird’s Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps.
We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a 1.2 million FPV & BEV pair dataset encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.
Read the paper, “Map It Anywhere (MIA): Empowering Bird’s Eye View Mapping using Large-scale Public Data”
About the Speakers
Cherie Ho is a final year robotics PhD student at Carnegie Mellon University working with Prof. Sebastian Scherer. Her research interest is in the intersection of field robotics, computer vision, and machine learning to develop robots that can continuously learn in new scenarios. She has developed generalizable, adaptive, and uncertainty-awarerobot algorithms for dynamic real-world applications. Applications include high-speed offroad driving, outdoor multi-drone systems, and outdoor wheelchairs. She is a recipient of Croucher Scholarship for Doctoral Study.
Jiaye (Tony) Zou is a senior CS undergraduate from Carnegie Mellon University. He is interested in multi-modal perception in dynamic real-world environments. He has developed MapItAnywhere, a large-scale data engine and baseline model for generalizable Bird’s Eye View mapping.
Omar Alama is starting his PhD at Carnegie Mellon University ECE in Fall 2024 advised by Prof. Sebastian Scherer and working in the Airlab at the CMU Robotics Institute. His research interests revolve around classical and modern deep-learning-based computer vision, which is used to build generalizable and efficient perception systems.
- Network event59 attendees from 25 groups hostingBest of NeurIPS - Feb 6Link visible for attendees
Date and Time
Feb 6, 2025 at 9 AM PacificWelcome to the Best of NeurIPS virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Intrinsic Self-Supervision for Data Quality Audits
Benchmark datasets in computer vision often contain issues such as off-topic samples, near-duplicates, and label errors, compromising model evaluation accuracy. This talk will discuss SelfClean, a data-cleaning framework that leverages self-supervised representation learning and distance-based indicators to detect these issues effectively.
By framing the task as a ranking or scoring problem, SelfClean minimizes human effort while outperforming competing methods in identifying synthetic and natural contamination across natural and medical domains. With this methodology, we identified up to 16% of problematic samples in current benchmark datasets and enhanced the reliability of model performance evaluation.
Read the paper, “Intrinsic Self-Supervision for Data Quality Audits”
About the Speaker
Fabian Gröger is a second-year PhD Student supervised by Alexander A. Navarini and Marc Pouly at the University of Basel. His research interests include self-supervised learning, data-centric machine learning research, and medical imaging.
CLIP: Insights into Zero-Shot Image Classification with Mutual Knowledge
We interpret CLIP’s zero-shot image classification by examining shared textual concepts learned by its vision and language encoders. We analyzes 13 CLIP models across various architectures, sizes, and datasets. The approach highlights a human-friendly way to understand CLIP’s classification decisions.
Read the paper, “Interpreting and Analysing CLIP’s Zero-Shot Image Classification via Mutual Knowledge”
About the Speaker
Fawaz Sammani is a 2nd year PhD student at the Vrije Universiteit Brussel. His research focuses on Human-Friendly Interpretability and Explainability of deep neural networks.
Multiview Scene Graph
Motivated by how humans perceive scenes, we propose the Multiview Scene Graph (MSG) as a general topological scene representation. MSG constructs a place+object graph from unposed RGB images and we provide novel metrics to evaluate the graph quality. We combine visual place recognition and object association to build MSG in one Transformer decoder model. We believe MSG can connect dots across classic vision tasks to promote spatial intelligence and open new doors for topological 3D scene understanding.
Read the paper, “Multiview Scene Graph”
About the Speaker
Juexiao Zhang is a second-year PhD student in computer science at NYU Courant, advised by Professor Chen Feng. He is interested in learning scene representations that are useful for robots to understand the world and interact with it.
Past events (2329)
See all- Network event177 attendees from 25 groups hostingJan 22 - Advanced Computer Vision Data Curation and Model Evaluation WorkshopThis event has passed