Federated Web Search & Evaluation in Search Engine Design

Developer or researcher and interested in data science? Join our next meetup on federated web search and effective search engine design!

Talks will be in English. Please sign up so we can estimate the amount of catering.


18:00 - 18:30 Pizzas, drinks, networking
18:30 - 19:00 Lessons learned from Federated Search in the Wild (Djoerd Hiemstra)
19:00 - 19:30 Integrating evaluation into the search engine design process (Wouter Alink)
19:30 - 20:30 More drinks & networking

Abstracts and bio's can be found below.

Lessons learned from Federated Search in the Wild (Djoerd Hiemstra, University of Twente)

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the so-called deep web become available through a unified interface, leading to a wider variety in the retrieved search results. I will present the lessons that we learned from running the Federated Web Search task of the Text Retrieval Conference (TREC), cherry picking from the results of the best participating systems among the 40 research groups that participated over the course of several years. I will conclude the talk by discussing our future plans: Running the University of Twente's search engine as a federation of more than 25 smaller search engines, including courses, news, publications, telephone numbers, images, google custom search, twitter, and youtube.

Bio: Djoerd Hiemstra is associate professor at the University of Twente. He wrote an often cited Ph.D. thesis on language models for information retrieval and contributed to over 200 research papers in the field of information retrieval. His research interests include formal models of information retrieval, peer-to-peer and federated search, and statistical natural language processing. Djoerd contributed to several open source search prototypes and published papers with research labs of several search engine companies, including Microsoft (where he did an internship in 2000), Yahoo (where he was a visiting researcher in 2008), and Yandex (which he visited in 2011).

Integrating evaluation into the search engine design process (Wouter Alink, Spinque)

Traditional Information Retrieval studies focussed on evaluating algorithms against test corpora (Cranfield experiments). Unfortunately this trend hasn't been picked up by the vast majority of search-engines implemented in industry. These are hardly evaluated using test-corpora. Evaluation is often on a per-case basis. On the other hand it has been difficult to find real test-cases to use for evaluation purposes in scientific studies. Within companies, real test-cases (query-logs, product purchases) are plentiful but often can't/won't be shared due to privacy/sensitivity issues. This talk I will discuss Spinque's new search-by-strategy editor, which integrates the creation and use of test-sets and evaluations into the design process of a search engine. Eventually this may help solve privacy issues with in-company data, as search algorithms could be shared, tested and compared without actually having to obtain the logs.

Bio: Wouter Alink studied computer science at Twente University, and did his MSc research at the Centrum Wiskunde & Informatica (CWI) and the Netherlands Forensic Institute (NFI). He developed a novel, XML-based approach towards managing and querying forensic traces extracted from digital evidence. This approach has been implemented in XIRAF, a nowadays widely used tool within Dutch police for forensic analysis that provides the forensic investigator with a rich query environment in which browsing, searching, and predefined query templates are all expressed in terms of XML database queries. He applied the same ideas at University of Amsterdam to improve their question answering technology. Alink is one of the founders of Spinque.