Past Meetup

Transfer learning for dialog and off-policy evaluation for recommendation

This Meetup is past

53 people went

Every last Friday of the month

Location image of event venue

Details

This Friday we'll have two talks followed by drinks.

16:00 Thomas Wolf (Hugging Face) - A Transfer Learning Approach to Open-Domain Neural Network Dialog Agents

16:30 Rolf Jagerman (University of Amsterdam) When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments

========================

16:00 Thomas Wolf (Hugging Face) - A Transfer Learning Approach to Open-Domain Neural Network Dialog Agents

Free-form dialogue systems (also called chatbots) are dialog agents that are designed to interact with humans in open-ended conversations ("small talk"). Developing these systems tackles the general research question of how a model can generate a coherent and interesting discussion with a human-being. These dialog agents are thus test-beds for many interactive AI systems and are also directly useful in applications ranging from technical support services to entertainment. However, building such intelligent conversational agents remains an unsolved problem. I will present a summary and the technical details of our participation to the Conversational Intelligence Challenge 2 which was part of the NeurIPS 2018 conference held in Montreal (convai.io) in early December 2018 and which won the automatic evaluation track. The Conversational Intelligence Challenge aimed at testing how an agent could be provided with a simple personality and common sense reasoning abilities to generate meaningful responses. Our agent showed significant improvements on the three tested metrics (perplexity, answer retrieval accuracy and response generation F1) over the baseline and the second ranked model. These improvements were obtained by using a transfer learning training scheme from a large corpus combined with a multi-tasks fine-tuning scheme that was able to take advantage of several inductive biases.
========================

16:30 Rolf Jagerman (University of Amsterdam) When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments

Collecting online metrics via A/B testing is a gold standard for evaluating a new policy, e.g. a new feature or recommendation model. However, A/B tests are not without pitfalls. They can be expensive in terms of engineering or logistic overhead and even harmful to the user experience as an untested policy is exposed to the end-users of the system.
Methods from off-policy evaluation tell us how historical interaction data can be leveraged to estimate the performance of a new policy offline, alleviating some of the problems surrounding A/B testing. However, existing off-policy estimators do not work well in non-stationary environments, ones where the users' behavior changes over time.
In this talk I will provide a brief introduction to the non-stationary off-policy evaluation problem and introduce two new estimators that work significantly better than standard estimators in non-stationary environments. Our findings open the way for off-policy estimators to be applied in practical real-world settings where non-stationarity is prevalent.