Skip to content

Mixed-script Information Retrieval

Photo of Katie Bauer
Hosted By
Katie B. and Dan B.
Mixed-script Information Retrieval

Details

Doors and networking at 6:00 PM, talk followed by a Q&A from 7:00-8:30 .

Venue: 44 Tehama St, San Francisco, CA 94105

Classroom 311

Speaker: Parth Gupta, Bio (http://users.dsic.upv.es/~pgupta/), researcher at the Natural Language Engineering Lab (http://users.dsic.upv.es/grupos/nle/?file=kop1.php) at the Technical University of Valencia, Spain

Title: Mixed-script Information Retrieval

Abstract: For many languages that use non-Roman based indigenous scripts (e.g. Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or multi-lingual space with more than one script which is referred as the Mixed-Script space. IR in the mixed-script space is challenging because queries written in either the native or the Roman script need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. Through analysis of the query logs of Bing search engine, Mixed-Script IR will be introduced, its prevalence will be discussed, and the details of the deep-learning based principled solution to the term modeling challenge where the Mixed-Script terms are modeled jointly through deep-autoencoder will be explained. The talk will close by discussing impact of Mixed-Script IR on popular NLP applications like sentiment analysis, recommendations, machine translation, cross-language text analysis etc in user-generated content.

Photo of Bay Area NLP (Natural Language Processing) group
Bay Area NLP (Natural Language Processing)
See more events
Galvanize
44 Tehama Street (between 2nd/howard) · San Francisco, CA