Skip to content

Details

In 2018-2019 transformer networks greatly improved what we can do in with text - for better or worse. Transformer models like BERT, Elmo, BigBird, and GPT-2 jumped up the leaderboards and caused a step-change in our NLP capabilities.

In some cases, they showed superhuman performance for the first time (paraphrasing and question answering). In others, they showed a worrying ability to make spam. We can also use them to improve our ability to translate, summarise, measure sentiment, and many more.

As a result, several of our members have been a bit obsessed with these models. But how do they work? Sean and/or Mike will dive into transformer architectures with material from the Illustrated Transformer (http://jalammar.github.io/illustrated-transformer/).

People may also be interested in "The Annotated Transformer" (https://github.com/harvardnlp/annotated-transformer), and transformers in 200 lines of code (https://gist.github.com/thomwolf/ca135416a30ea387aa20edaa9b21f0ed).

Please arrive before 6pm to ensure entrance

Related topics

You may also like