Dense Passage Retrieval at SEA with Akari Asai and Arnold Overwijk
Details
***
IMPORTANT: You will be able to view the Zoom link once you 'attend' the meetup on this page.
***
SEA is back again in October! This month's SEA features Akari Asai (U. of Washington) and Arnold Overwijk (Microsoft).
17.00: Akari Asai (U. of Washington)
Title: One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
Abstract: We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state-of-the-art on multilingual open question answering benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.
17.30: Arnold Overwijk (Microsoft)
Title: Document Understanding at Microsoft
Abstract: Arnold Overwijk is leading the document understanding team. His team develops state-of-the-art deep learning technologies for natural language problems, which are applied to large scale production systems for search and recommendation scenarios. Technologies they develop include: information retrieval, ranking, question answering, machine reading comprehension, key phrase extraction, topic modeling and semantic document (a logical understanding of the document based on visual and text information in a multi-modal setting, e.g. identify section headings, paragraphs, publication date, author, etc.).
