Past Meetup

Mixed-script Information Retrieval

This Meetup is past

50 people went

Location image of event venue


Doors and networking at 6:00 PM, talk followed by a Q&A from 7:00-8:30 .

Venue: 44 Tehama St, San Francisco, CA 94105

Classroom 311

Speaker: Parth Gupta, Bio (, researcher at the Natural Language Engineering Lab ( at the Technical University of Valencia, Spain

Title: Mixed-script Information Retrieval

Abstract: For many languages that use non-Roman based indigenous scripts (e.g. Arabic, Greek and Indic languages) one can often find a large amount of user generated transliterated content on the Web in the Roman script. Such content creates a monolingual or multi-lingual space with more than one script which is referred as the Mixed-Script space. IR in the mixed-script space is challenging because queries written in either the native or the Roman script need to be matched to the documents written in both the scripts. Moreover, transliterated content features extensive spelling variations. Through analysis of the query logs of Bing search engine, Mixed-Script IR will be introduced, its prevalence will be discussed, and the details of the deep-learning based principled solution to the term modeling challenge where the Mixed-Script terms are modeled jointly through deep-autoencoder will be explained. The talk will close by discussing impact of Mixed-Script IR on popular NLP applications like sentiment analysis, recommendations, machine translation, cross-language text analysis etc in user-generated content.