Search, Mismatches, and Biases

Details
This month we will have a special edition of SEA in collaboration with and at SPUI25: SEA++. Instead of two talks, we'll have three talks: two academic and one industry, followed by drinks. The academic talks will be given by professor Diane Kelly from University of North Carolina at Chapel Hill and Ilya Markov from ILPS-UvA (see the abstract below). Dolf Trieschnigg from MyDataFactory will give the industry talk (see the abstract below). The titles and abstracts of Diane's talk will be announced later.
Please note that this edition of SEA will be held in SPUI25.
Program:
15:30 - 16:00 Diane Kelly (http://ils.unc.edu/~dianek/) (University of North Carolina at Chapel Hill)
16:00 - 16:30 Ilya Markov (https://staff.fnwi.uva.nl/i.markov/) (ILPS, UvA)
16:30 - 17:00 Dolf Trieschnigg (http://dolf.trieschnigg.nl) (MyDataFactory)
17:00 - 18:00 Drinks
Details of the talks:
-------------------------------------------------------------------------------------
Diane Kelly--Search Results Navigator: A Tool to Help Users Overcome Biases
There has been a recent rise in scholarship and popular press that focuses on the potential biases built in to search tools and other online services that rely on large amounts of user data for their operations. This literature raises many interesting questions about the advancement of particular ideological worldviews, the privileging of certain types of information and the eradication of human judgment. The rise of this literature has paralleled a rise in research in the information retrieval community focused on user bias. This research has primarily focused on the biases users exhibit when interacting with information. Most often, the goal is to (machine) learn these biases so they can be incorporated into the search algorithm, without much regard about how systems might be designed to help users overcome their own biases or any biases that have been built in to the search system. In this talk, I will describe the Search Results Navigator, a search interface tool designed to help users overcome biases, including those introduced by system designers, as well as those they might consciously and unconsciously engage in during search. I will describe the methods we used to evaluate the tool, with some commentary on the challenges conducing user-centered evaluations, and present some preliminary results of our evaluation.
Diane Kelly is a Professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Her research and teaching interests are in interactive information search and retrieval, information search behavior, and research methods. Kelly is the recipient of a Francis Carroll McColl Term Professorship at UNC, the 2014 ASIST Research Award and the 2013 British Computer Society’s IRSG Karen Spärck Jones Award. She is the recipient of two teaching awards: the 2009 ASIST/Thomson Reuters Outstanding Information Science Teacher Award and the 2007 SILS Outstanding Teacher of the Year Award. She is the current ACM SIGIR treasurer and recently co-chaired the inaugural ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR).
-------------------------------------------------------------------------------------
Ilya Markov--Removing bias from user interaction data
User interaction with search engines is affected by various biases. For example, users tend to click on top search results (position bias), they can be attracted by visually salient content such as images (attention bias), etc. At the same time, user interaction data contains invaluable information about users, their interests and preferences in search and, thus, is heavily used by search engines to improve their quality. However, to reliably use this interaction data and to uncover actual user preferences, various biases must be removed first. This talk discusses the problem of bias in user interaction data and approaches to removing this bias. After a general discussion, the talk focuses on a particular type of user interactions, namely time between user actions in search (e.g., time between clicks, time to first click, time between queries, etc). We show that such times are context-biased, i.e., they are affected by the context in which they are observed (e.g., ranks of clicked documents, user search history, etc). To remove the context bias, we model the time between user actions as a probability distribution. The parameters of this distribution are composed of two components: context-dependent and context-independent. After learning these components using neural networks, we show that the context-aware model approximates the time between user actions significantly better than models that do not consider context. Moreover, by splitting the model into context-dependent and context-independent parts we remove the context bias from the latter. As a result, we show that the context-independent component can be used to improve the quality of search.
Ilya Markov is a postdoctoral researcher at the University of Amsterdam. His research agenda builds around information retrieval methods for heterogeneous search environments. Ilya has experience in federated search, user behavior analysis, click models and effectiveness metrics. He is a PC member of leading IR conferences, such as SIGIR, CIKM, WWW, etc., a PC chair of the RuSSIR 2015 summer school and a co-organizer of the IMine-2 task at NTCIR-12. Ilya is currently teaching an MSc course on web search and has previously taught information retrieval courses at the BSc and MSc levels and given courses at conferences and summer schools in IR. Ilya obtained his PhD at the University of Lugano and was a visiting researcher at the University of Strathclyde, University of Glasgow and Yandex.
-------------------------------------------------------------------------------------
Dolf Trieschnigg--Flexible Matching of Product Data
Product data is everywhere, ranging from size and colour information about products on e-commerce websites to specifications of spare parts in enterprise databases. Finding the desired product in such a database is difficult because of the mismatch between product descriptions. Product metadata might be described or spelled differently, the same description might have multiple meanings, or vital information might be missing. In this talk I will discuss the challenges of product search and how we deal with these issues at Mydatafactory. I will talk about how we use and adapt Elasticsearch to deal with some of these problems in the context of industrial product data.
Dolf Trieschnigg received a MSc in computer science and a PhD in information retrieval from the University of Twente. In his PhD project he investigated various techniques for dealing with semantic mismatch in searching biomedical literature. The last five years he worked as a postdoctoral researcher on various information retrieval and text mining topics, ranging from federated web search and social media analysis, to language identification and keyword extraction. Since April 2015 he is working as a data scientist at Mydatafactory in Meppel where he is responsible for the matching and extraction algorithms used in the Mydatafactory data cleansing application.

Search, Mismatches, and Biases