Skip to content

Details

We are excited to announce our SEA session for June on innovations in search indexing. In the sessions, Dr. Sebastian Bruch (Pinecone) will present his recent work on optimistic query routing to enhance maximum inner product search, and Hansi Zeng (University of Massachusetts Amherst) will present his recent work on scaling up generative retrieval models (DSI).

You can join us online or in person at Lab42, room L3.36. Please note that this SEA meet-up starts at 16:00 CET.

***
IMPORTANT: You can view the Zoom link once you 'attend' the meetup on this page.
***
Speaker: Dr. Sebastian Bruch (Pinecone)
Title: Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search
Abstract: Clustering-based nearest neighbor search is a simple yet effective method in which data points are partitioned into geometric shards to form an index, and only a few shards are searched during query processing to find an approximate set of top-k vectors. Even though the search efficacy is heavily influenced by the algorithm that identifies the set of shards to probe, it has received little attention in the literature. This work attempts to bridge that gap by studying the problem of routing in clustering-based maximum inner product search (MIPS). We begin by unpacking existing routing protocols and notice the surprising contribution of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a new framework that incorporates the moments of the distribution of inner products within each shard to optimistically estimate the maximum inner product. We then present a simple instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers such as \scann by probing up to 50 fewer points on a suite of benchmark MIPS datasets. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and in practice requires storing only O(1) additional vectors per shard.
Time: 16:00
Abstract: TBA

SEA Talk #270

Speaker: Hansi Zeng (University of Massachusetts Amherst)
Topic: Scalable and Effective Generative Information Retrieval
Time: 16:30
Abstract: TBA

SEA Talk #271

Related topics

Information Architecture
Science
Apache Solr
Elasticsearch
Technology

You may also like