Skip to content

Attention Mechanisms in Deep Learning

Photo of Andrew Rowan
Hosted By
Andrew R. and 2 others
Attention Mechanisms in Deep Learning

Details

Sign-up is on Skills Matter (https://skillsmatter.com/meetups/9706-attention-mechanisms-in-deep-learning). Meetup.com RSVP is not used for this event.

-----------

Introduction

In deep NLP, recurrent neural networks (RNNs) are used to generate a sequence of words from an image, video, or another sentence. However, the information in the input must be compressed into lower dimensional vectors that suffer from a lack of information. This is particularly problematic when generating long sequences of words. Even LSTMs have a finite memory!

Attention mechanisms allow the RNN to attend to any part of the input image/video/sentence in order to generate the next word. This leads to better translation and new interesting ways to introspect our deep NLP models. In this session we'll dive into the seminal work of Bahdanau, Cho, and Bengio https://arxiv.org/abs/1409.0473 to get a better understanding of how and why these architectures work so well.

Blog

Attention and Memory in Deep Learning and NLP (http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/) - Wild ML

Paper

Neural Machine Translation by Jointly Learning to Align and Translate (https://arxiv.org/abs/1409.0473), D Bahdanau, K Cho, Y Bengio - ICLR 2015

Code

A TensorFlow implementation (https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/seq2seq_model.py) of a sequence-to-sequence model with an attention mechanism is described here (https://www.tensorflow.org/tutorials/seq2seq).

--------

Background Material

Oxford CS: Deep Learning for Natural Language Processing 2016-2017 Lecture 8 (https://github.com/oxford-cs-deepnlp-2017/lectures#10-lecture-8---generating-language-with-attention-chris-dyer): slides (https://github.com/oxford-cs-deepnlp-2017/lectures/blob/master/Lecture%208%20-%20Conditional%20Language%20Modeling%20with%20Attention.pdf) and recording. (http://media.podcasts.ox.ac.uk/comlab/deep_learning_NLP/2017-01_deep_NLP_8_conditional_lang_mod_att.mp4)

---

A note about the Journal Club format:

  1. There is no speaker at Journal Club.

  2. There is NO speaker at Journal Club.

  3. We split into small groups of 6 people and discuss the papers. For the first hour the groups are random to make sure everyone is on the same page. Afterwards we split into blog/paper/code groups to go deeper.

  4. Volunteers sometimes seed the discussion by guiding through the paper highlights for 5 mins. You are very welcome to volunteer in the comments.

  5. Reading the materials in advance is really helpful. If you don't have time, please come anyway. We need this group to learn together.

Photo of London Data Science Journal Club group
London Data Science Journal Club
See more events
Skills Matter at CodeNode
10 South Place, London · EC2M 2RB