Skip to content

Learning to Detect Stance and Represent Emojis: Isabelle Augenstein (UCL)

Photo of Pontus Stenetorp
Hosted By
Pontus S.
Learning to Detect Stance and Represent Emojis: Isabelle Augenstein (UCL)

Details

In this two-part talk, I will first introduce our work on stance detection (EMNLP 2016) and then on learning emoji representations (SocialNLP@EMNLP 2016, best paper).

Stance detection is the task of classifying the attitude expressed in a text towards a target such as Hillary Clinton to be "positive", "negative" or "neutral". Previous work has assumed that either the target is mentioned in the text or that training data for every target is given. This paper considers the more challenging version of this task, where targets are not always mentioned and no training data is available for the test targets. We experiment with conditional LSTM encoding, which builds a representation of the tweet that is dependent on the target, and demonstrate that it outperforms encoding the tweet and the target independently. Performance is improved further when the conditional model is augmented with bidirectional encoding. We evaluate our approach on the SemEval 2016 Task 6 Twitter Stance Detection corpus achieving performance second best only to a system trained on semi-automatically labelled tweets for the test target. When such weak supervision is added, our approach achieves state-of-the-art results.

Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all Unicode emojis which are learned from their description in the Unicode emoji standard. The resulting emoji embeddings can be readily used in downstream social natural language processing applications alongside word2vec. We demonstrate, for the downstream task of sentiment analysis, that emoji embeddings learned from short descriptions outperforms a skip-gram model trained on a large collection of tweets, while avoiding the need for contexts in which emojis need to appear frequently in order to estimate a representation.

Photo of UCL Natural Language Processing Meetup group
UCL Natural Language Processing Meetup
See more events
UCL/BBC London Media Technology Campus
5th Floor, One Euston Square, 40 Melton Street, London NW1 2FD · London