• Irwan Bello | LambdaNetworks and Recent Developments in Computer Vision

    Virtual London Machine Learning Meetup -[masked] @ 18:30

    We would like to invite you to our next Virtual Machine Learning Meetup. Please read the papers below and help us create a vibrant discussion.

    The discussion will be facilitated by Gül Varol, an Assistant Professor at the IMAGINE team of École des Ponts ParisTech. Previously, she was a postdoctoral researcher at the University of Oxford (VGG), working with Andrew Zisserman. She obtained her PhD from the WILLOW team of Inria Paris and École Normale Supérieure. Her research is focused on human understanding in videos, specifically action recognition, body shape and motion analysis, and sign language.

    Agenda:
    - 18:25: Virtual doors open
    - 18:30: Talk
    - 19:10: Q&A session
    - 19:30: Close

    *Sponsors*
    https://evolution.ai/ : Machines that Read - Intelligent data extraction from corporate and financial documents.

    * Title: LambdaNetworks and Recent Developments in Computer Vision
    (Irwan Bello is a Research Scientist at Google Brain where he works on Deep Learning)

    * Papers:
    https://arxiv.org/abs/2102.08602
    https://arxiv.org/abs/2103.07579

    Abstract: The first part of the talk will be dedicated to LambdaNetworks: Modeling Long-Range Interactions Without Attention. Lambda layers are a scalable alternative framework to self-attention for capturing long-range structured interactions between an input and contextual information. Similar to linear attention, lambda layers bypass expensive attention maps, but in contrast, they model both content and position-based interactions which enables their application to large structured inputs such as images. The resulting neural network architectures, LambdaNetworks, significantly outperform their convolutional and attentional counterparts on ImageNet classification, COCO object detection and instance segmentation, while being more computationally efficient. We design LambdaResNets that reach state-of-the-art accuracies on ImageNet while being 3.2 - 4.4x faster than the popular EfficientNets on modern machine learning accelerators. In large-scale semi-supervised training, LambdaResNets achieve up to 86.7% ImageNet accuracy while being 9.5x faster than EfficientNet NoisyStudent and 9x faster than a Vision Transformer with comparable accuracy.

    Second, I'll discuss a recent preprint Revisiting ResNets: Improved Training and Scaling Strategies. Novel vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended. Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification. We recommend practitioners use these simple revised ResNets as baselines for future research.

    Bio: Irwan Bello is a Research Scientist at Google Brain where he works on Deep Learning. His research interests primarily lie in modeling, scaling and designing architectures that process structured information while trading off scalability and inductive biases.