Skip to content

Efficient Visual Self-Attention

Photo of Peter Naf
Hosted By
Peter N.
Efficient Visual Self-Attention

Details

The attention mechanism is arguably one of the most important breakthroughs in deep learning in the last decade. It first appeared as an auxiliary module to assist for word alignment in machine translation. Later, the Transformer architecture revolutionarily replaced recurrence completely by self-attention and swiftly took over the entire field of natural language processing.

Its adoption in computer vision did not come until recently, with the quadratic computational complexity plaguing its applications. This talk dives deeply into a series of works by Mr. Shen on a novel efficient formulation of attention, its application onto video understanding, and the quest for a fully-attentional architecture on the basis of it.

Lecture slides: https://docs.google.com/presentation/d/1EViv963ihIZZhemmgMc1_7Vbf-CeePiY

Talk is based on the speaker's papers:

  1. Efficient attention: https://arxiv.org/abs/1812.01243 ; https://github.com/cmsflash/efficient-attention
  2. Global context module: https://arxiv.org/abs/2001.11243
  3. GSA-Net: https://arxiv.org/abs/2010.03019

Presenter BIO:
Mr. Zhuoran Shen holds a BEng in Computer Science from The University of Hong Kong. He is joining Pony.ai as a Software Engineer in Perception. Earlier, he has been an AI Resident at Google Research and Research Interns at Tencent and SenseTime. His research focuses on the attention mechanism for computer vision, including fully-attentional visual modeling and efficient attention. He also has interests in large-scale visual pertaining and applications of computer vision.

His page: https://cmsflash.github.io/

Photo of 2d3d.ai group
2d3d.ai
See more events
Online event
This event has passed