SEA: Search Engines Amsterdam - Reproducibility


Details
The June edition of SEA has four talks by students from a course taught by the group, all of whose papers were accepted at SIGIR. This is a hybrid event. Note that we are meeting in a different room than usual!
Location: 904, Science Park Amsterdam, Room C0.110
The Zoom link, in case you want to join online, will be available when you RSVP.
Speaker #1: Oliver Savolainen/Dur e Najaf Amjad
Title: Interpreting Multilingual and Document-Length Sensitive Relevance Computations in Neural Retrieval Models through Axiomatic Causal Interventions
Abstract: This reproducibility study analyzes and extends the paper "Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models," which investigates how neural retrieval models encode task-relevant properties such as term frequency. We reproduce key experiments from the original paper, confirming that information on query terms is captured in the model encoding. We extend this work by applying activation patching to Spanish and Chinese datasets and by exploring whether document-length information is encoded in the model as well. Our results confirm that the designed activation patching method can isolate the behavior to specific components and to- kens in neural retrieval models. Moreover, our findings indicate that the location of term frequency generalizes across languages and that in later layers, the information for sequence-level tasks is represented in the CLS token. The results highlight the need for further research into interpretability in information retrieval and reproducibility in machine learning research.
Speaker #2: Jakub Podolak
Title: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion
Abstract: In this talk, we’ll describe Setwise prompting - the zero-shot document-ranking strategy first introduced by Zhuang et al. - and share the results of our full reproducibility study, which confirms its balance of ranking quality and compute cost against Pointwise, Pairwise, and Listwise baselines. We’ll then introduce the audience to Setwise Insertion, our new variant that injects the initial retrieval ranking as prior knowledge so the LLM skips low-value comparisons and concentrates on documents most likely to shuffle upward. Tested on Flan-T5, Vicuna, and Llama, Setwise Insertion trims query latency by 31 percent, slashes model calls by 23 percent, and even nudges effectiveness upward - all without any fine-tuning. Join us to see how a small tweak to the prompt can make zero-shot reranking both faster and better, and to discuss what this means for practical, scalable retrieval systems
Link: https://arxiv.org/abs/2504.10509
Speaker #3: Emmanouil Georgios Lionis
Title: Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Abstract: Text data are often encoded as dense vectors, known as embeddings, which capture semantic, syntactic, contextual, and domain-specific information. These embeddings, widely adopted in various applications, inherently contain rich information that may be susceptible to leakage under certain attacks. The GEIA framework highlights vulnerabilities in sentence embeddings, demonstrating that they can reveal the original sentences they represent. In this study, we reproduce GEIA's findings across various neural sentence embedding models. Additionally, we contribute new analysis to examine whether these models leak sensitive information from their training datasets. We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA. The key idea is to examine differences between log-likelihood for masked and original variants of data that sentence embedding models have been pre-trained on, calculated on the embedding space of the attacker. Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings, seriously undermining their security.
Link: https://arxiv.org/abs/2504.16609
Speaker #4: Thijmen Nijdam
Title: Reproducing NevIR: Negation in Neural Information Retrieval
Abstract: Negation remains a persistent challenge in neural Information Retrieval (IR), despite the central role Language Models (LMs) play in modern systems. In this talk, we present our recent study that reproduces and expands upon the NevIR benchmark, which previously highlighted the limitations of IR models in handling negation—often performing no better than random ranking. We begin by revisiting the original NevIR experiments and extending the evaluation to include several recently developed, state-of-the-art IR models. Our results show that while newer listwise LLM re-rankers perform better than earlier methods, their understanding of negation still falls short of human-level performance. To probe generalisability, we also evaluate these models using ExcluIR, a benchmark focused on exclusionary queries rich in negation. Our analysis reveals that improvements on one benchmark do not reliably transfer to the other, underscoring the nuanced differences between these datasets. Notably, only cross-encoders and listwise LLM re-rankers demonstrate consistent, though still limited, success across both tasks. Our talk will explore these findings in detail, discuss implications for the design of future IR systems, and offer insights into the complexity of negation as a linguistic and modeling challenge
Link: https://arxiv.org/pdf/2502.13506
Counter: SEA Talks #283, #284, #285 and #286.

Every last Friday of the month
SEA: Search Engines Amsterdam - Reproducibility