ConceptDrift: Uncovering Biases through the Lens of Foundation Models


Details
An important goal of ML research is to identify and mitigate unwanted biases intrinsic to datasets and already incorporated into pre-trained models. Previous approaches have identified biases using highly curated validation subsets, that require human knowledge to create in the first place. This limits the ability to automate the discovery of unknown biases in new datasets. We solve this by using interpretable vision-language models, combined with a filtration method using LLMs and known concept hierarchies. More exactly, for a dataset, we use pre-trained CLIP models that have an associated embedding for each class and see how it drifts through learning towards embeddings that disclose hidden biases. We call this approach ConceptDrift and show that it can be scaled to automatically identify biases in datasets like ImageNet without human prior knowledge. We propose two bias identification evaluation protocols to fill the gap in the previous work and show that our method significantly improves over SoTA methods, both using our protocol and classical evaluations. Alongside validating the identified biases, we also show that they can be leveraged to improve the performance of different methods. Our method is not bounded to a single modality, and we empirically validate it both on image (Waterbirds, CelebA, ImageNet), and text datasets (CivilComments).
Authors: Cristian Daniel Păduraru, Antonio Bărbălau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu
Paper: https://arxiv.org/abs/2410.18970
* short version accepted @NeurIPS2024 Workshop - Interpretable AI
Code: TBA
Speaker: Elena Burceanu
The presentation will be held in Romanian.
Physical location only: Faculty of Mathematics and Computer Science, University of Bucharest - Google Room, floor 2, first room near staircases

ConceptDrift: Uncovering Biases through the Lens of Foundation Models