Conversational Data & Hebrew NLP


Details
Hi Everyone,
Very happy to announce the 9th Israeli NLP Meetup! This time we'll have two great talks: Dr. Raphael Cohen from Chorus.ai (http://chorus.ai)will tell us about extracting insights from conversational data using the NLP and data science toolbox, and Amir More from the Open University will tell us about cutting edge research in Hebrew NLP and syntactic parsing.
We will meet at the new Chorus.ai headquarters in Tel Aviv, who will kindly sponsor this Meetup.
Looking forward to see you there,
Roee
Talk abstract:
NLP for Extracting Insights from Conversational Data
Recent advancement in Automatic Speech Recognition (ASR) technology based on deep learning allows for much higher accuracy in extracting information from conversations. This creates a new opportunity for conversational analytics, a new domain in data science that investigates human interactions, which until recently were not amenable to analysis at scale due to the high bar of applying speech recognition. Chorus has developed a service for recording customer interactions and extracting insights for coaching, optimizing the sales process and delivering customer responses and needs directly to product designers and product management. We will present the opportunity and challenge in applying data science and Natural Language Processing techniques to this hybrid data of voice, conversation and the text gathered from analyzing 3,000 hours of sales discussions. Specifically, review the stack needed for answering basic questions: in-domain speech recognition, speaker separation (diarization), automatically assigning multiple labels to the data and statistical analysis of the result. We’ll address the questions: How should you apply NLP to transcribed data to analyze questions? How can word embeddings help us spot transcription errors?
Bio: Raphael is a senior scientist at Chorus where he applies state of the art Natural Language Processing, and Speech Analysis algorithms for extracting insights and enriching our customer's calls. Before Chorus he worked in EMC as a principal data scientist finding innovative business solutions based on the organization's big data in areas such as Customer Success and predicting hard-drive failures. Raphael has a PhD in CS from Ben-Gurion University in the topic of NLP for Medical Hebrew under Prof. Michael Elhadad and M.Sc in Genetics under Prof. Ohad Birk.
Talk abstract:
Morpho-Syntactic Parsing of Modern Hebrew
Modern Hebrew presents formidable barriers to its inclusion in NLP applications due to the incompatibilities it introduces with NLP pipelines. As a Morphologically Rich Language (MRL), it violates traditional structuralists’ assumptions on the relationship between surface tokens in raw text and the syntactic words that form the nodes of a dependency tree. In this presentation, we introduce the audience to syntactic processing in general and dependency grammar in particular, as well as examples of downstream applications such as semantic parsing. We then present the problems introduced by MRLs, previous solutions, and the solution proposed in the speaker’s M.Sc thesis on joint morpho-syntactic processing, the current published state-of-the-art for Modern Hebrew. We then briefly introduce neural models, and discuss problems encountered when applying them to Modern Hebrew and MRLs. If time allows, we present the participation of Hebrew in the Universal Dependencies (UD) project, an initiative with participants all over the globe for consistently annotated treebanks with 70 treebanks over 50 languages (and counting). We briefly introduce the CoNLL 2017 Shared Task, and discuss relevant results for Hebrew.
Bio: Amir More is a researcher at the Open University's NLP Lab. His recently completed M.Sc thesis under Dr. Reut Tsarfaty focused on advancing the state-of-the-art in Morphological and Syntactic Processing of Modern Hebrew.


Conversational Data & Hebrew NLP