
What we’re about
This is the official group for the Data Science Salon Community in San Francisco/Bay Area: https://datascience.salon/
Our mission is to bring together everyone that is interested in and working in the cloud technologies. Learn, network, present and meet others in the cloud space.
Access on demand content on YouTube: https://www.youtube.com/c/DataScienceSalon
Sponsors
See allUpcoming events (3)
See all- Network event150 attendees from 39 groups hostingJuly 9 - Best of CVPRLink visible for attendees
Join us for a series of virtual events focused on the most interesting and groundbreaking research presented at this year's CVPR conference!
When
July 9, 2025 at 9 AM PacificWhere
Online. Register for the Zoom!What Foundation Models really need to be capable of for Autonomous Driving – The Drive4C Benchmark
Foundation models hold the potential to generalize the driving task and support language-based interaction in autonomous driving. However, they continue to struggle with specific reasoning tasks essential for robotic navigation. Current benchmarks typically provide only aggregate performance scores, making it difficult to assess the underlying capabilities these models require. Drive4C addresses this gap by introducing a closed-loop benchmark that evaluates semantic, spatial, temporal, and physical understanding—enabling more targeted improvements to advance foundation models for autonomous driving.
About the Speaker
Tin Stribor Sohn is a PhD Student at Porsche AG and Karlsruhe Institute of Technology in the area of Foundation Models for Scenario Understanding and Decision Making in Autonomous Robotics, Tech Lead at Data Driven Engineering for Autonomous Driving, Prior: Master in CS at University of Tuebingen with focus on Computer Vision and Deep Learning and co-founder of a software company for smart EV charging.
Human Motion Prediction – Enhanced Realism via Nonisotropic Gaussian Diffusion
Predicting future human motion is a key challenge in generative AI and computer vision, as generated motions should be realistic and diverse at the same time. This talk presents a novel approach that leverages top-performing latent generative diffusion models with a novel paradigm. Nonisotropic Gaussian diffusion leads to better performance, fewer parameters, and faster training at no additional computational cost. We will also discuss how such benefits can be obtained in other application domains.
About the Speaker
Cecilia Curreli is a Ph.D. student at the Technical University of Munich, specializing in generative models. A member of the AI Competence Center at MCML, she has conducted research in deep learning, computer vision, and quantum physics through international collaborations with the University of Tokyo and the Chinese Academy of Science.
Efficient Few-Shot Adaptation of Open-Set Detection Models
We propose an efficient few-shot adaptation method for the Grounding-DINO open-set object detection model, designed to improve performance on domain-specific specialized datasets like agriculture, where extensive annotation is costly. The method circumvents the challenges of manual text prompt engineering by removing the standard text encoder and instead introduces randomly initialized, trainable text embeddings. These embeddings are optimized directly from a few labeled images, allowing the model to quickly adapt to new domains and object classes with minimal data. This approach demonstrates superior performance over zero-shot methods and competes favorably with other few-shot techniques, offering a promising solution for rapid model specialization.
About the Speaker
Dr. Sudhir Sornapudi is a Senior Data Scientist- II at Corteva Agriscience. He leads the Advanced Vision Intelligence team, driving computer vision innovations internally from cell-to-space with Biotechnology, Crop Health, and Seed Operations.
OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
Optical imaging capable of resolving nanoscale features would revolutionize scientific research and engineering applications across biomedicine, smart manufacturing, and semiconductor quality control. However, due to the physical phenomenon of diffraction, the optical resolution is limited to approximately half the wavelength of light, which impedes the observation of subwavelength objects such as the native state coronavirus, typically smaller than 200 nm. Fortunately, deep learning methods have shown remarkable potential in uncovering underlying patterns within data, promising to overcome the diffraction limit by revealing the mapping pattern between diffraction images and their corresponding ground truth object images.
However, the absence of suitable datasets has hindered progress in this field —— collecting high-quality optical data of subwavelength objects is highly difficult as these objects are inherently invisible under conventional microscopy, making it impossible to perform standard visual calibration and drift correction. Therefore, we provide the first general optical imaging dataset based on the “building block” concept for challenging the diffraction limit. Drawing an analogy to modular construction principles, we construct a comprehensive optical imaging dataset comprising subwavelength fundamental elements, i.e., small square units that can be assembled into larger and more complex objects. We then frame the task as an image-to-image translation task and evaluate various vision methods. Experimental results validate our “building block” concept, demonstrating that models trained on basic square units can effectively generalize to realistic, more complex unseen objects. Most importantly, by highlighting this underexplored AI-for-science area and its potential, we aspire to advance optical science by fostering collaboration with the vision and machine learning communities.
About the Speakers
Wang Benquan is the final-year PhD candidate at Nanyang Technological University, Singapore. His research interests are AI for Science, scientific deep learning, optical metrology and imaging.
Ruyi is a PhD at University of Texas at Austin, working on generative models and reinforcement learning, and their applications.
- Network event94 attendees from 38 groups hostingJuly 11 - Best of CVPR Virtual EventLink visible for attendees
Join us on July 11 at 9 AM Pacific for the third of several virtual events showcasing some of the most thought-provoking papers from this year’s CVPR conference.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
As AI becomes more prevalent in fields like healthcare, ensuring its reliability under unexpected inputs is essential. We present OpenMIBOOD, a benchmarking framework for evaluating out-of-distribution (OOD) detection methods in medical imaging. It includes 14 datasets across three medical domains and categorizes them into in-distribution, near-OOD, and far-OOD groups to assess 24 post-hoc methods. Results show that OOD detection approaches effective in natural images often fail in medical contexts, highlighting the need for domain-specific benchmarks to ensure trustworthy AI in healthcare.
About the Speaker
Max Gutbrod is a PhD student in Computer Science at OTH Regensburg, Germany, with a research focus on medical imaging. He’s working on improving the resilience of AI systems in healthcare, so they can continue performing reliably, even when faced with unfamiliar or unexpected data.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works learn such representations by contrastively aligning geolocation[lat,lon] with co-located images.
While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information-theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We show this retrieval strategy outperforms the existing state-of-the-art models with significant margins in most tasks.
About the Speaker
Aayush Dhakal is a Ph.D. candidate in Computer Science at Washington University in St. Louis (WashU), currently advised by Dr. Nathan Jacobs in the Multimodal Vision Research Lab (MVRL). My work focuses on solving geospatial problems using Deep Learning and Computer Vision. This often involves some combination of computer vision, remote sensing, and self-supervised learning. I love to develop methods that allow seamless interaction of multiple modalities, such as images, text, audio, and geocoordinates.
FLAIR: Fine-Grained Image Understanding through Language-Guided Representations
CLIP excels at global image-text alignment but struggles with fine-grained visual understanding. In this talk, I present FLAIR—Fine-grained Language-informed Image Representations—which leverages long, detailed captions to learn localized image features. By conditioning attention pooling on diverse sub-captions, FLAIR generates text-specific image embeddings that enhance retrieval of fine-grained content. Our model outperforms existing methods on standard and newly proposed fine-grained retrieval benchmarks, and even enables strong zero-shot semantic segmentation—despite being trained on only 30M image-text pairs.
About the Speaker
Rui Xiao is a PhD student in the Explainable Machine Learning group, supervised by Zeynep Akata from Technical University of Munich and Stephan Alaniz from Telecom Paris. His research focuses on learning across modalities and domains, with a particular emphasis on enhancing fine-grained visual capabilities in vision-language models.
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Semi-supervised medical image segmentation often suffers from class imbalance and high uncertainty due to pathology variability. We propose DyCON, a Dynamic Uncertainty-aware Consistency and Contrastive Learning framework that addresses these challenges via two novel losses: UnCL and FeCL. UnCL adaptively weights voxel-wise consistency based on uncertainty, initially focusing on uncertain regions and gradually shifting to confident ones. FeCL improves local feature discrimination under imbalance by applying dual focal mechanisms and adaptive entropy-based weighting to contrastive learning.
About the Speaker
Maregu Assefa is a postdoctoral researcher at Khalifa University in Abu Dhabi, UAE. His current research focuses on advancing semi-supervised and self-supervised multi-modal representation learning for medical image analysis. Previously, his doctoral studies centered on visual representation learning for video understanding tasks, including action recognition and video retrieval.
- Network event228 attendees from 39 groups hostingJuly 17 - AI, ML and Computer Vision MeetupLink visible for attendees
When and Where
July 17, 2025 | 10:00 – 11:30 AM Pacific
Using VLMs to Navigate the Sea of Data
At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.
About the Speaker
Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.
About the Speaker
Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.
Building Efficient and Reliable Workflows for Object Detection
Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.
About the Speaker
Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.
Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets
High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.
Past events (88)
See all- Network event358 attendees from 38 groups hostingJune 27 - Visual AI in HealthcareThis event has passed