Efficient Visual Self-Attention


Details
The attention mechanism is arguably one of the most important breakthroughs in deep learning in the last decade. It first appeared as an auxiliary module to assist for word alignment in machine translation. Later, the Transformer architecture revolutionarily replaced recurrence completely by self-attention and swiftly took over the entire field of natural language processing.
Its adoption in computer vision did not come until recently, with the quadratic computational complexity plaguing its applications. This talk dives deeply into a series of works by Mr. Shen on a novel efficient formulation of attention, its application onto video understanding, and the quest for a fully-attentional architecture on the basis of it.
Lecture slides: https://docs.google.com/presentation/d/1EViv963ihIZZhemmgMc1_7Vbf-CeePiY
Talk is based on the speaker's papers:
- Efficient attention: https://arxiv.org/abs/1812.01243 ; https://github.com/cmsflash/efficient-attention
- Global context module: https://arxiv.org/abs/2001.11243
- GSA-Net: https://arxiv.org/abs/2010.03019
Presenter BIO:
Mr. Zhuoran Shen holds a BEng in Computer Science from The University of Hong Kong. He is joining Pony.ai as a Software Engineer in Perception. Earlier, he has been an AI Resident at Google Research and Research Interns at Tencent and SenseTime. His research focuses on the attention mechanism for computer vision, including fully-attentional visual modeling and efficient attention. He also has interests in large-scale visual pertaining and applications of computer vision.
His page: https://cmsflash.github.io/

Efficient Visual Self-Attention