NLP: Bridging Languages Through Images and Weakly Supervised Semantic Parsing
Details
Agenda:
17:30-18:30: Bridging Languages Through Images with Deep Partial Canonical Correlation Analysis by Guy Rotman
18:30-18:40: Networking break
18:40-19:40: Weakly Supervised Semantic Parsing with Abstract Examples
Bridging Languages Through Images with Deep Partial Canonical Correlation Analysis
We present a deep neural network that leverages images to improve bilingual text embeddings. Relying on bilingual image tags and descriptions, our approach conditions text embedding induction on the shared visual information for both languages, producing highly correlated bilingual embeddings. In particular, we propose a novel model based on Partial Canonical Correlation Analysis (PCCA). We introduce a non-linear Deep PCCA (DPCCA) model, and develop a new stochastic iterative algorithm for its optimization. We evaluate PCCA and DPCCA on multilingual word similarity and cross-lingual image description retrieval. Our models outperform a large variety of previous methods, despite not having access to any visual signal during test time inference.
Guy Rotman holds an MSc degree in Information Management Engineering from the Technion, Israel Institute of Technology. Previously, he received a BSc in Industrial Engineering and Management and a BA in Economics and Management from the Technion, graduating both summa cum laude.
His research Interests are: multilingual and multimodal models in the field of NLP, and optimization algorithms for deep neural networks.
"Weakly Supervised Semantic Parsing with Abstract Examples"
Training semantic parsers from weak supervision (denotations) rather than strong supervision (programs) complicates training in two ways. First, a large search space of potential programs needs to be explored at training time to find a correct program. Second, spurious programs that accidentally lead to a correct denotation add noise to training. In this work we propose that in closed worlds with clear semantic types, one can substantially alleviate these problems by utilizing an abstract representation, where tokens in both the language utterance and program are lifted to an abstract form. We show that these abstractions can be defined with a handful of lexical rules and that they result in sharing between different examples that alleviates the difficulties in training.
Omer Goldman is a MSc student under the supervision of Jonathan Berant at the CS department in Tel Aviv University, researching NLP bordering on Computational Linguistics. He received his BSc in Physics and Linguistics (cum laude) from Tel Aviv University in 2014.
