Razvan Pascanu, A closer look at some limitations of transformer


Details
We are pleased that Razvan Pascanu - one of the top Romanian researchers in ML / AI [1], will have an invited lecture at UPB on Wednesday, May 7th, at 7pm in EC105. Razvan is working at Google Deepmind, co-organizer of EEML (Eastern European Machine Learning summer school) and Romanian AI Days among many others, and he is a very close friend and mentor for the Romanian AI/ML community. This year he is also one of the PC Chairs at the most prestigious ML / deep learning conference - NeurIPS.
Razvan will be covering in his talk a very relevant research topic: the limitations of transformer architectures. Details and some references are below.
If you are working in AI/ML or have collaborators / students interested in these topics, it would be great to participate or forward the invite to them.
Have a nice day,
Traian
------------------------
Title: A closer look at some limitations of transformer
Abstract:
Transformers are becoming a dominant architecture in machine learning, being widely used from vision to language to reinforcement learning. They are part of the backbone of modern large language models. In this talk we will discuss some potential limitations of the architecture itself, focusing predominantly on the attention layer as a mechanism of temporal mixing of information. In particular I will focus on recent work looking at the ability of transformers to reason about position, generalize to longer contexts or explore or be able to learn full rank representations. I will finish the talk with some generic thoughts on how we can improve our understanding of these architectures, and are open questions facing us.
References:
https://arxiv.org/abs/2406.04267,
https://arxiv.org/abs/2410.01104,
https://arxiv.org/abs/2410.06205,
https://arxiv.org/html/2504.02732v2,
https://arxiv.org/abs/2504.16078
https://arxiv.org/abs/2503.21676
[1] - https://scholar.google.com/citations?user=6nKHDKYAAAAJ&hl=en&oi=ao (65k+ citations)

Razvan Pascanu, A closer look at some limitations of transformer