Attention Mechanisms in Deep Learning


Details
Sign-up is on Skills Matter (https://skillsmatter.com/meetups/9706-attention-mechanisms-in-deep-learning). Meetup.com RSVP is not used for this event.
-----------
Introduction
In deep NLP, recurrent neural networks (RNNs) are used to generate a sequence of words from an image, video, or another sentence. However, the information in the input must be compressed into lower dimensional vectors that suffer from a lack of information. This is particularly problematic when generating long sequences of words. Even LSTMs have a finite memory!
Attention mechanisms allow the RNN to attend to any part of the input image/video/sentence in order to generate the next word. This leads to better translation and new interesting ways to introspect our deep NLP models. In this session we'll dive into the seminal work of Bahdanau, Cho, and Bengio https://arxiv.org/abs/1409.0473 to get a better understanding of how and why these architectures work so well.
Blog
Attention and Memory in Deep Learning and NLP (http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/) - Wild ML
Paper
Neural Machine Translation by Jointly Learning to Align and Translate (https://arxiv.org/abs/1409.0473), D Bahdanau, K Cho, Y Bengio - ICLR 2015
Code
A TensorFlow implementation (https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/seq2seq_model.py) of a sequence-to-sequence model with an attention mechanism is described here (https://www.tensorflow.org/tutorials/seq2seq).
--------
Background Material
Oxford CS: Deep Learning for Natural Language Processing 2016-2017 Lecture 8 (https://github.com/oxford-cs-deepnlp-2017/lectures#10-lecture-8---generating-language-with-attention-chris-dyer): slides (https://github.com/oxford-cs-deepnlp-2017/lectures/blob/master/Lecture%208%20-%20Conditional%20Language%20Modeling%20with%20Attention.pdf) and recording. (http://media.podcasts.ox.ac.uk/comlab/deep_learning_NLP/2017-01_deep_NLP_8_conditional_lang_mod_att.mp4)
---
A note about the Journal Club format:
-
There is no speaker at Journal Club.
-
There is NO speaker at Journal Club.
-
We split into small groups of 6 people and discuss the papers. For the first hour the groups are random to make sure everyone is on the same page. Afterwards we split into blog/paper/code groups to go deeper.
-
Volunteers sometimes seed the discussion by guiding through the paper highlights for 5 mins. You are very welcome to volunteer in the comments.
-
Reading the materials in advance is really helpful. If you don't have time, please come anyway. We need this group to learn together.

Attention Mechanisms in Deep Learning