Alexey Bochkovskiy | YOLOv4 and Dense Prediction Transformers


Details
Virtual London Machine Learning Meetup - 12.01.2022 @ 18:30
We would like to invite you to our next Virtual Machine Learning Meetup.
Agenda:
- 18:25: Virtual doors open
- 18:30: Talk
- 19:10: Q&A session
- 19:30: Close
Sponsors
https://evolution.ai/ : Machines that Read - Intelligent data extraction from corporate and financial documents.
- Title: YOLOv4 and Dense Prediction Transformers (Alexey Bochkovskiy is a research engineer at Intel)
Papers: YOLOv4: https://arxiv.org/abs/2004.10934
Scaled-YOLOv4: Scaling Cross Stage Partial Network: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html
Vision Transformers for Dense Prediction: https://openaccess.thecvf.com/content/ICCV2021/html/Ranftl_Vision_Transformers_for_Dense_Prediction_ICCV_2021_paper.html
Abstract: Recently, transformer-based neural networks are increasingly used in computer vision instead of convolutions. We will talk about their theoretical and practical advantages (attention, global receptive field) and disadvantages (quadratic increase in complexity with increasing resolution, no default spatial invariance) and about ways to optimize them for current and future hardware. There are many scientific developments aimed at creating the most accurate neural networks by increasing the size of the neural network and the size of the private dataset. Often the speed of such neural networks is lower than acceptable on the available mass user hardware, and the accuracy is low if you use only the available datasets. Another problem is that neural networks are often optimized in terms of FLOPS, a theoretical measure of computational complexity, without considering the degree of parallelism on multicore / multichip devices and memory bandwidth requirements, which leads to slow speed. In this talk, we will see ways to build neural networks that are faster and allow for cheaper hardware while maintaining accuracy. We will talk about what problems and ways to solve them are for various tasks of computer vision, and about which neural network architectures will be preferable in the future.
Bio: Alexey Bochkovskiy is a research engineer at Intel, focusing on deep learning and computer vision, with six years of experience in machine learning and over sixteen years of experience as a C++ developer. Author of the state-of-the-art neural networks among the developments of the world's leading IT companies in highly competitive tasks: Object detection (YOLOv4, Scaled-YOLOv4), Semantic segmentation (DPT model), Monocular Depth Estimation (DPT model). He specializes in PyTorch, C++, CUDA and developing state-of-the-art neural networks for computer vision such as YOLOv4 / Scaled-YOLOv4, which is used by the Taiwan government, BMW Innovation Lab, Amazon, and other companies.
The discussion will be facilitated by Sasha Sax. Sasha is a PhD student at UC Berkeley. His research focuses on designing visual perception systems for real-world agents, and characterizing how the optimal perception system depends on the agent's objectives and environment. His work has received several awards, including CVPR 2018 Best Paper and CVPR 2020 Best Paper Honorable Mention, an NVIDIA Pioneering Research award, and first place at the Habitat AI navigation challenge (RGB only). He’s advised by Jitendra Malik and Amir Zamir, and holds a BS (Math) and MS (CS) from Stanford University.

Sponsors
Alexey Bochkovskiy | YOLOv4 and Dense Prediction Transformers