Multilingual Information Extraction from short text messages


Details
Abstract of the talk:
Natural language processing (NLP) systems are required to address text processing in multiple languages for applications catering to rich multilingual societies like India. The advent of joint multilingual models like Google’s mBERT and Facebook’s XLM-R paved the way to realise a wide range of NLP tasks in multiple languages. These models still require adaptation with appropriate dataset to provide efficient language representation for specific use cases. In this talk, a conversational assistant that works for multiple Indian languages is discussed, where the intent classification as a part of natural language understanding of customer queries is explained in detail. A dedicated sales query assistant expects similar type of customer messages, even if they come in multiple languages. Banking on this ‘inherently parallel’ nature of text data, we observed that the same multilingual model fine-tuned with text from Malayalam language only had provided faithful accuracy for intent classification in Hindi, Tamil & Telugu, which can be extended to other Indian languages as well.
Pre-requisites for Attendees:
- Have interest in AI and conversational assistants
- Basic knowledge in text processing, ML models and adaptation
Take Aways From The Talk:
You will get to know about massively multilingual models, and their ability and limitations in knowledge transfer.
About Karthika:
Karthika Vijayan is a Solution Consultant at Sahaj AI Software, focussing on voice & text based AI and data science applications. Karthika holds her Bachelor’s and Master’s degrees in Electronics and Communication Engineering, and a PhD in Speech Processing for Indian Institute of Technology Hyderabad. Later she worked as a post doctoral research associate at Indian Institute of Science Bangalore (1 year) and as a research fellow at the National University of Singapore (4 years). She has extensive experience working on several projects related to automatic speech recognition, speech synthesis, automatic speaker recognition and singing voice processing. Her research interests include speech processing and natural language processing for AI, pattern recognition and machine learning, deep learning, etc.

Multilingual Information Extraction from short text messages