August 27, 2014 · 6:30 PM
Faster and Smaller Inverted Indexes with Treaps
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that the space consumption is below 10% of the size of the corpus and the index performs queries up to twice as fast than previous compact representations, which in addition require more space.
Roberto Konow is a computer science Ph.D candidate and part-time researcher and lecturer at Universidad Diego Portales. His Ph.D topic and interests are related to Algorithms, Data Structures, Compression and Information Retrieval. His Ph.D advisor is Dr. Gonzalo Navarro. Before he started his Ph.D, he worked in a Japanese company, Skillupjapan, as research engineer in projects related to recommender systems. He is currently a Ph.D Technical Intern at eBay working on optimizations for the search engine.
6:30 Eat & Greet
7:00 Talk - Great speakers, good food, free beer.
Event will be held at the eBay campus just off 17/880 @ Hamilton in the main Community building. Look for lobby/flagpole.
I'm always looking for speakers. Please suggest speakers or topics you would like to hear.