Recurrent Neural Networks with Attention and why they are so hard to train

Dear Data Science Journal readers!

Please read the materials before the meetup. If you don't get a chance, feel free to come join the discussion anyway

1) Motivation: "Teaching Machines to Read and Comprehend" ( by Deepmind.

2) A math and code treatment of attention in NN - looking for advice in the comments.

One idea: Recurrent Models of Visual Attention (

3) Pascanu "On the difficulty of training Recurrent Neural Networks ("

Introductory blog posts, theses and ipython notebooks are very welcome in the comments.

