Special on Transformer Models for Natural Language Processing


Details
Transformer models and the recent BERT algorithm overcome a lot of issues of the Word2Vec/LSTM approach. These new models improve context awareness and ambiguity resolution. Google recently announced that about 10% of English search queries will be processed by these new algorithms.
If you have never heard of transformer models we recommend to read the following two papers:
Vaswani et al 2017: Attention is all you need, https://arxiv.org/abs/1706.03762
Devlin et al. 2018: BERT, https://arxiv.org/abs/1810.04805
Try it out on your own! Type or copy a few sentences or dialogue fragments into https://talktotransformer.com/ and let GPT-2 generate a continuation. With every click on the button you'll get a new twist how your story develops. It is awesome!
Talk 1 (Kazuki Irie and Albert Zeyer, RWTH Aachen)
Transformer Models: Deep Transformer Models for speech and language processing using RETURNN
Talk 2 (Christoph Henkelmann,divis.io)
From Paper to Product – How we implemented BERT
Abstract Talk 1
In this talk, we will talk about our recent development of Transformer based speech and language models at Prof. Hermann Ney's Chair of Computer Science 6 at RWTH Aachen University. The first part of the talk, given by Kazuki Irie, will focus on language modeling with deep Transformers with application to automatic speech recognition. We show how the Transformer architecture, originally proposed for machine translation, can be scaled up to accommodate the large training data of language modeling task, and finally achieves excellent performance for automatic speech recognition. The second part of the talk, given by Albert Zeyer, will focus on our software RETURNN, RWTH's TensorFlow based framework for neural networks. Its flexible implementation, which allows to experiment researchers with various model architectures as well as different tasks, will be described. This flexibility will be illustrated by an example of end-to-end speech recognition entirely based on the Transformer.
Abstract Talk 2
(Christoph Henkelmann, divis.io)
From Paper to Product – How we implemented BERT
BERT is a state-of-the-art natural language processing (NLP) model that allows pretraining on unlabelled text data and later transfer training to a variety of NLP tasks. Due to its promising novel ideas and impressive performance we chose it as a core component for a new natural language generation product. Reading a paper, maybe following a tutorial with example code and putting a working piece of software into production are, however, two totally different things.
In this talk we will tell you how we trained a custom version of the BERT network and included it into a natural language generation (NLG) application. You will hear how we arrived at the decision to use BERT and what other approaches we tried. A number of changes to the vanilla BERT paper will be discussed that allowed us to train and deploy the network on consumer-grade GPUs and make it highly cost-effective, including a morph-based input encoding to reduce dimensionality and increase side-channel knowledge and of course a lot of hyperparameter tuning.
We will tell you about the failures and the mistakes we made so you do not have to repeat them, but also about the surprises, successes and lessons learned.
Bio:
Albert Zeyer is a Ph.D. student in the Human Language Technology Group at RWTH Aachen University, Germany, since 2014, under the supervision of Prof. Hermann Ney. He received both the Diplom (M.Sc.) in Mathematics and the Diplom (M.Sc.) in Computer Science from RWTH Aachen University in 2013. His research is focused on neural networks in general. The beginning of his first studies and passion for neural networks and connectionism goes back to 1996. The topics of his recent work include recurrent networks, attention models, and end-to-end models in general, with applications in speech recognition, translation and language modeling, where he achieved many state-of-the-art results. Albert started developing software in 1995, and has published a variety of Open Source projects since then. The TensorFlow based software RETURNN, which he has developed as the main architect for his Ph.D. research, is now widely used by his teammates at RWTH Aachen University, and even beyond.
Kazuki Irie is a Ph.D. student in the Human Language Technology Group at RWTH Aachen University, Germany, under the supervision of Prof. Hermann Ney, since May 2014. Prior to that, he received a Diplôme d'ingénieur degree from École Centrale Paris, France, and jointly a Master’s degree (Master MVA) from ENS Cachan, France, both in Applied Mathematics in 2013. His Ph.D. research is focused on advancing language modeling for its applications to speech recognition and machine translation. He is broadly interested in RNN, language, and related machines. He interned twice at Google as a software engineer in research role: in New York in 2017 and in Mountain View in 2018.
Christoph Henkelmann is the CTO and co-founder of divis.io. He has studied Computer Science at the University of Bonn and has 20 years of industry experience building real-world software. Apart from his work at DIVISIO he is a magazine author and regular speaker at various conferences on a range of AI topics.

Special on Transformer Models for Natural Language Processing