This Friday we'll have two talks followed by drinks.
16:00 Georgios Tsatsaronis (Elsevier) Topic Pages: From Articles to Answers
Automating the process of learning definitions from unstructured text at scale enables applications with great impact, such as building glossaries, dictionaries, or topic pages that may profile scientific concepts and help readers of scientific articles understand the contents faster and in depth. In this talk we are introducing Topic Pages, a publicly available set of automatically created information pages for scientific concepts across 21 domains. We are discussing the technical challenges pertaining to extracting the relevant information from tens of millions of book chapters and scientific articles, as well as the novel methodologies and architecture that were used, sitting at the borders of Machine Learning, Natural Language Processing and Scalable Data Processing and Management. The focus will be given on the best technical practices utilized to create this large scale machine learning production pipeline, as well as on the novel methodology used to learn textual definitions from unstructured text, based on Multiview LSTMs.
Dr. George Tsatsaronis is Vice President Data Science, Research Content Operations, at Elsevier (RELX Group). Prior to joining Elsevier in 2016 he worked in academia for 13 years, doing research and teaching in the fields of machine learning, natural language processing and bioinformatics in universities in UK, Greece, Norway and Germany. He has published more than 60 scientific articles in high impact peer review journals and conference proceedings in various areas of Artificial Intelligence, primarily natural language processing and text mining. His PhD is in the field of text mining, and he also holds a BSc in Informatics from Athens University of Economics and Business, and an MSc in Advanced Computing from Imperial College London, with specialization in Artificial Intelligence and robotics. He is the inventor of several Artificial Intelligence pipelines that support some of the biggest research platforms of Elsevier.
16:30 Priyanka Agrawal (Booking.com) Unified Semantic Parsing with Weak Supervision
Semantic parsing over multiple knowledge bases enables a parser to exploit structural similarities of programs across the multiple domains. However, the fundamental challenge lies in obtaining high-quality annotations of (utterance, program) pairs across various domains needed for training such models. To address this problem, this talk discusses a novel framework to build a unified multi-domain enabled semantic parser trained only with weak supervision (denotations). Weakly supervised training is particularly arduous as the program search space grows exponentially in a multi-domain setting. To solve this, we incorporate a multi-policy distillation mechanism in which we first train domain-specific semantic parsers (teachers) using weak supervision, followed by training a single unified parser (student) from these domain specific teacher policies. The resultant semantic parser is not only compact but also generalizes better, and generates more accurate programs. It further does not require the user to provide a domain label while querying. Our experiments demonstrate that the proposed model significantly improves the performance in comparison to baseline techniques.
Priyanka Agrawal has recently moved to Amsterdam as a machine learning scientist at Booking.com. Prior to that, she was a senior research scientist at IBM Research Labs - India where her work spanned conversational agents, federated search systems spanning diverse knowledge bases and modalities (text and images). Her work has appeared in several top-tier conferences like NeurIPS, ACL and has also served as PC member at NeurIPs 2019, EMNLP2019, etc.