The May edition of SEA involves Efficiency in Neural IR, with two speakers. This is a hybrid event. Note that we are meeting in a different room than usual!
Location: 904, Science Park Amsterdam, Room C0.110
The Zoom link, in case you want to join online, will be available when you RSVP.
Speaker #1: Roberto Esposito (Weaviate)
Title: How to scale similarity search for Late Interaction Models
Abstract: Late interaction models—such as ColBERT and ColPALI—have reshaped how we compute the relevance in semantic search. Unlike traditional dense retrieval models that rely on a single vector embedding per document, these models produce multi-vector representations, allowing for deeper interactions between query and document. Crucially, this approach leads to significantly improved ranking performance. However, it introduces a fundamental challenge: how do we efficiently search over multiple embeddings at scale? In this talk, we’ll explore how Weaviate adapts our vector indexing techniques—such as Hierarchical Navigable Small World (HNSW)—to support late interaction embeddings. Finally, we’ll discuss how to reduce the memory footprint of multi-vector search using vector quantization techniques like Binary Quantization (BQ), Scalar Quantization (SQ), and Product Quantization (PQ), as well as approaches to encode multi-vectors into single-vector representations for more efficient indexing
Bio: Roberto is a Research Engineer on the Applied Research team at Weaviate, where he's currently focused on multi-vector support for the Weaviate vector database. He holds a master's degree from the University of Pisa, where his research centered on Approximate Nearest Neighbor Search (ANN) and compression algorithms.
Speaker #2: Pooya Khandel (University of Amsterdam)
Title: PEIR: Modeling Performance in Neural Information Retrieval
Abstract: The efficiency of neural information retrieval methods is primarily evaluated by measuring query latency. In practice, measuring latency is highly tied to hardware configurations and requires extensive computational resources. Given the rapid introduction of retrieval models, achieving an overall comparison of their efficiency is challenging. In this paper, we introduce PEIR, a framework for hardware-independent efficiency measurements in Learned Sparse Retrieval (LSR). By employing performance modeling approaches from high-performance computing, we derive performance models for query evaluation approaches such as BlockMax-MaxScore (BMM) and propose to measure memory and/or floating-point operations while performing retrieval on input queries. We demonstrate that by using PEIR, similar conclusions on comparing the latency of retrieval models are obtained.
Bio: Pooya Khandel is a former PhD candidate at the IRLab and PCS research groups at the University of Amsterdam. His work is focused on addressing large-scale challenges in NLP and IR, particularly in the area of neural search, either through performance analysis and the application of parallel computing techniques or through data and algorithmic efficiency with sampling and distillation.
Counter: SEA Talks #281 and #282.