
What we’re about
This is the official group for the Data Science Salon Community in San Francisco/Bay Area: https://datascience.salon/
Our mission is to bring together everyone that is interested in and working in the cloud technologies. Learn, network, present and meet others in the cloud space.
Access on demand content on YouTube: https://www.youtube.com/c/DataScienceSalon
Sponsors
See allUpcoming events (4+)
See all- Network event280 attendees from 39 groups hostingJune 19 - AI, ML and Computer Vision MeetupLink visible for attendees
When
June 19, 2025 | 10:00 AM PacificWhen and Where
Online. Register for the Zoom.Multi-Modal Rare Events Detection for SAE L2+ to L4
A burst tire on the highway or a fallen motorbiker occur rarely and thus pose extra efforts to Autonomous vehicles. Methods to tackle such edge cases in road scenarios are explained.
About the Speaker
Wolfgang Schulz is Product Owner for Lidar Perception at Continental. He engages in the automotive industry since 2005. With his team he currently works on components for an SAE L4 stack.
Voxel51 + NVIDIA Omniverse: Exploring the Future of Synthetic Data
Join us for a lightning talk on one of the most exciting frontiers in Visual AI: synthetic data. We’ll showcase a sneak peek of the new integration between FiftyOne and NVIDIA Omniverse, featuring fully synthetic downtown scenes of Santa Jose. NVIDIA Omniverse is enabling the generation of ultra-precise synthetic sensor data, including LiDAR, RADAR, and camera feeds, while FiftyOne is making it easy to extract value from these rich datasets. Come see the future of sensor simulation and dataset curation in action, with pixel-perfect labels to match.
About the Speaker
Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
We propose O-TPT, a method to improve the calibration of vision-language models (VLMs) during test-time prompt tuning. While prompt tuning improves accuracy, it often leads to overconfident predictions. O-TPT introduces orthogonality constraints on textual features, enhancing feature separation and significantly reducing calibration error across multiple datasets and model backbones.
About the Speaker
Ashshak Sharifdeen is a visiting Student Researcher at Mohamed bin Zayed University of Artificial Intelligence, UAE
Advancing MLLMs for 3D Scene Understanding
Recent advances in Multimodal Large Language Models (MLLMs) have shown impressive reasoning capabilities in 2D image and video understanding. However, these models still face significant challenges in achieving holistic comprehension of complex 3D scenes. In this talk, we present our recent progress toward enabling global 3D scene understanding for MLLMs. We will cover newly developed benchmarks, evaluation protocols, and methods designed to bridge the gap between language and 3D perception.
About the Speaker
Xiongkun Linghu is a research engineer at the Beijing Institute for General Artificial Intelligence (BIGAI). His research focuses on Multimodal Large Language Models and Embodied Artificial Intelligence, with an emphasis on 3D scene understanding and grounded reasoning.
- Network event205 attendees from 38 groups hostingJune 27 - Visual AI in HealthcareLink visible for attendees
Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.
When
June 27 at 9 AM Pacific
Where
Online. Register for the Zoom!
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders
We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.
About the Speakers
Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.
Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.
Leveraging Foundation Models for Pathology: Progress and Pitfalls
How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.
About the Speaker
Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.
About the Speaker
Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.
LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare
Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.
About the Speaker
Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.
- Network event82 attendees from 39 groups hostingJuly 9 - Best of CVPRLink visible for attendees
Join us for a series of virtual events focused on the most interesting and groundbreaking research presented at this year's CVPR conference!
When
July 9, 2025 at 9 AM PacificWhere
Online. Register for the Zoom!What Foundation Models really need to be capable of for Autonomous Driving – The Drive4C Benchmark
Foundation models hold the potential to generalize the driving task and support language-based interaction in autonomous driving. However, they continue to struggle with specific reasoning tasks essential for robotic navigation. Current benchmarks typically provide only aggregate performance scores, making it difficult to assess the underlying capabilities these models require. Drive4C addresses this gap by introducing a closed-loop benchmark that evaluates semantic, spatial, temporal, and physical understanding—enabling more targeted improvements to advance foundation models for autonomous driving.
About the Speaker
Tin Stribor Sohn is a PhD Student at Porsche AG and Karlsruhe Institute of Technology in the area of Foundation Models for Scenario Understanding and Decision Making in Autonomous Robotics, Tech Lead at Data Driven Engineering for Autonomous Driving, Prior: Master in CS at University of Tuebingen with focus on Computer Vision and Deep Learning and co-founder of a software company for smart EV charging.
Human Motion Prediction – Enhanced Realism via Nonisotropic Gaussian Diffusion
Predicting future human motion is a key challenge in generative AI and computer vision, as generated motions should be realistic and diverse at the same time. This talk presents a novel approach that leverages top-performing latent generative diffusion models with a novel paradigm. Nonisotropic Gaussian diffusion leads to better performance, fewer parameters, and faster training at no additional computational cost. We will also discuss how such benefits can be obtained in other application domains.
About the Speaker
Cecilia Curreli is a Ph.D. student at the Technical University of Munich, specializing in generative models. A member of the AI Competence Center at MCML, she has conducted research in deep learning, computer vision, and quantum physics through international collaborations with the University of Tokyo and the Chinese Academy of Science.
Efficient Few-Shot Adaptation of Open-Set Detection Models
We propose an efficient few-shot adaptation method for the Grounding-DINO open-set object detection model, designed to improve performance on domain-specific specialized datasets like agriculture, where extensive annotation is costly. The method circumvents the challenges of manual text prompt engineering by removing the standard text encoder and instead introduces randomly initialized, trainable text embeddings. These embeddings are optimized directly from a few labeled images, allowing the model to quickly adapt to new domains and object classes with minimal data. This approach demonstrates superior performance over zero-shot methods and competes favorably with other few-shot techniques, offering a promising solution for rapid model specialization.
About the Speaker
Dr. Sudhir Sornapudi is a Senior Data Scientist- II at Corteva Agriscience. He leads the Advanced Vision Intelligence team, driving computer vision innovations internally from cell-to-space with Biotechnology, Crop Health, and Seed Operations.
OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
Optical imaging capable of resolving nanoscale features would revolutionize scientific research and engineering applications across biomedicine, smart manufacturing, and semiconductor quality control. However, due to the physical phenomenon of diffraction, the optical resolution is limited to approximately half the wavelength of light, which impedes the observation of subwavelength objects such as the native state coronavirus, typically smaller than 200 nm. Fortunately, deep learning methods have shown remarkable potential in uncovering underlying patterns within data, promising to overcome the diffraction limit by revealing the mapping pattern between diffraction images and their corresponding ground truth object images.
However, the absence of suitable datasets has hindered progress in this field —— collecting high-quality optical data of subwavelength objects is highly difficult as these objects are inherently invisible under conventional microscopy, making it impossible to perform standard visual calibration and drift correction. Therefore, we provide the first general optical imaging dataset based on the “building block” concept for challenging the diffraction limit. Drawing an analogy to modular construction principles, we construct a comprehensive optical imaging dataset comprising subwavelength fundamental elements, i.e., small square units that can be assembled into larger and more complex objects. We then frame the task as an image-to-image translation task and evaluate various vision methods. Experimental results validate our “building block” concept, demonstrating that models trained on basic square units can effectively generalize to realistic, more complex unseen objects. Most importantly, by highlighting this underexplored AI-for-science area and its potential, we aspire to advance optical science by fostering collaboration with the vision and machine learning communities.
About the Speakers
Wang Benquan is the final-year PhD candidate at Nanyang Technological University, Singapore. His research interests are AI for Science, scientific deep learning, optical metrology and imaging.
Ruyi is a PhD at University of Texas at Austin, working on generative models and reinforcement learning, and their applications.
- Network event78 attendees from 38 groups hostingJuly 11 - Best of CVPR Virtual EventLink visible for attendees
Join us on July 11 at 9 AM Pacific for the third of several virtual events showcasing some of the most thought-provoking papers from this year’s CVPR conference.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
As AI becomes more prevalent in fields like healthcare, ensuring its reliability under unexpected inputs is essential. We present OpenMIBOOD, a benchmarking framework for evaluating out-of-distribution (OOD) detection methods in medical imaging. It includes 14 datasets across three medical domains and categorizes them into in-distribution, near-OOD, and far-OOD groups to assess 24 post-hoc methods. Results show that OOD detection approaches effective in natural images often fail in medical contexts, highlighting the need for domain-specific benchmarks to ensure trustworthy AI in healthcare.
About the Speaker
Max Gutbrod is a PhD student in Computer Science at OTH Regensburg, Germany, with a research focus on medical imaging. He’s working on improving the resilience of AI systems in healthcare, so they can continue performing reliably, even when faced with unfamiliar or unexpected data.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works learn such representations by contrastively aligning geolocation[lat,lon] with co-located images.
While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information-theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We show this retrieval strategy outperforms the existing state-of-the-art models with significant margins in most tasks.
About the Speaker
Aayush Dhakal is a Ph.D. candidate in Computer Science at Washington University in St. Louis (WashU), currently advised by Dr. Nathan Jacobs in the Multimodal Vision Research Lab (MVRL). My work focuses on solving geospatial problems using Deep Learning and Computer Vision. This often involves some combination of computer vision, remote sensing, and self-supervised learning. I love to develop methods that allow seamless interaction of multiple modalities, such as images, text, audio, and geocoordinates.
FLAIR: Fine-Grained Image Understanding through Language-Guided Representations
CLIP excels at global image-text alignment but struggles with fine-grained visual understanding. In this talk, I present FLAIR—Fine-grained Language-informed Image Representations—which leverages long, detailed captions to learn localized image features. By conditioning attention pooling on diverse sub-captions, FLAIR generates text-specific image embeddings that enhance retrieval of fine-grained content. Our model outperforms existing methods on standard and newly proposed fine-grained retrieval benchmarks, and even enables strong zero-shot semantic segmentation—despite being trained on only 30M image-text pairs.
About the Speaker
Rui Xiao is a PhD student in the Explainable Machine Learning group, supervised by Zeynep Akata from Technical University of Munich and Stephan Alaniz from Telecom Paris. His research focuses on learning across modalities and domains, with a particular emphasis on enhancing fine-grained visual capabilities in vision-language models.
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Semi-supervised medical image segmentation often suffers from class imbalance and high uncertainty due to pathology variability. We propose DyCON, a Dynamic Uncertainty-aware Consistency and Contrastive Learning framework that addresses these challenges via two novel losses: UnCL and FeCL. UnCL adaptively weights voxel-wise consistency based on uncertainty, initially focusing on uncertain regions and gradually shifting to confident ones. FeCL improves local feature discrimination under imbalance by applying dual focal mechanisms and adaptive entropy-based weighting to contrastive learning.
About the Speaker
Maregu Assefa is a postdoctoral researcher at Khalifa University in Abu Dhabi, UAE. His current research focuses on advancing semi-supervised and self-supervised multi-modal representation learning for medical image analysis. Previously, his doctoral studies centered on visual representation learning for video understanding tasks, including action recognition and video retrieval.
Past events (86)
See all- Network event105 attendees from 38 groups hostingJune 18 - Getting Started with FiftyOne WorkshopThis event has passed