#1 Entity extraction by deep learning - from research to production
* by Hila Zarosim and Noam Rotem *
The Non-English Languages Team at TMS (Refinitiv’s Text Metadata Services) has recently released a deep-learning-based component for tagging people names in French news documents. The model, Tensorflow-based, was trained by Python code, while the final destination – the production environment, is Scala-based. In this meetup, we will describe our BILOU-based concept of entity extraction, the model structure, the quest for training data and the challenges deriving from training in Python while the runtime is in Scala… Then, we will follow up with sharing our experience with taking this solution to production: memory and latency challenges, multithreading, disk space, tracing runtime behavior and more. We will end our session with discussing the quality of our solution, and the immediate plans to improve it.
Hila Zarosim has joined the research team in Refinitiv (Previously known as Thomson Reuters) in 2014, after completing her P.hD. in Computer Science in Bar-Ilan University. Since then, she has been involved in a large number of NLP projects, including text classification, relation extraction, and entity extraction in English and non-English languages.
Noam Rotem leads the Non-English Languages team at Thomson Reuters’ Refinitiv Text Metadata Services, where he copes with NLP and engineering challenges in Latin languages, Asian languages, and sometimes also in English… Before joining TMS, Noam lead R&D teams and groups for 18 years, and was the CTO of various start-up companies in the cyber security domain.
#2 A Generative Model for Sentiment Analysis
* by Oren Sar Shalom *
Most sentiment analysis algorithms are only point estimators and their predictions overlook confidence. Accompanying predictions with an estimated confidence score is an instrumental output by its own, and can also potentially improve the optimization process.
In this talk I'll present a probabilistic model for textual review generation. Its realization is based on the well known CNN architecture for text classification, where a novel non-parametric layer is introduced. This layer infers the inherent variance in textual reviews, while simplifies the model.
The contributions of this work are: improve accuracy, learned prediction confidence and improved interpretability.
Oren is a principal data scientist at Intuit and holds a Ph.D. in Computer Science in the field of Recommender Systems. With more than 10 years of experience in the industry, he conducts research in both Recommender Systems and NLP related problems. He is also a member of the ACM Future of Computing Academy (https://www.acm.org/fca) and a co-chair of the AAAI-19 workshop on recommender systems and natural language processing (RecNLP).