In this meetup, we have two talks about real-time visual analysis.
Talk 1: Towards real-time interpretation of the physical world with FPGA and DNNs
Speakers: Nicolas von Roden, Hirad Rezaeian
Abstract: We aim for a compact, affordable and accurate real-time interpretation engine of the physical world. Processing of high-resolution input data from visual sensors in real-time to achieve correct and high accuracy scene understandings using scalable, robust and price efficient hardware is a huge challenge. We approach this via co-design of FPGA and DNN. On the software side, we exploit multi-task learning to combine different single-task models. On the hardware side, we aim to quantize model weights and activation functions for efficient deployment on FPGA-based hardware.
Bios: Nicolas is a Computer Vision Engineer at Advertima AG. He graduated in CS from the U of Erlangen-Nuremberg with a focus on image processing and ML. He is currently working on computer vision tasks for face recognition, pose detection and tracking in real-time as well as combining the various single-task models into a multi-task framework. He previously worked on tumor detection in magnetic resonance images for Siemens Healthineers.
Hirad is a Hardware Digital Design Engineer at Advertima AG. He graduated from ETH University as an electrical engineer with a focus on micro electronics and signal processing. His experience in digital signal processing and algorithm improvements regarding hardware implementation (ASIC and FPGA) led him work on ASICs for AI workload acceleration. Currently he is working on the quantization of the model weights and activation functions to reach a ternary weight network on a FPGA-based platform.
Talk 2: A divide & conquer approach to real time video segmentation on smartphones
Speaker: Noah Kutscher
Abstract: Real-time video segmentation for extracting humans from images has two challenges: compute speed, and segmentation quality. We are testing a semantic unit to make foreground estimation via a pre-trained Deep Neural Network. This information does not have to be provided in real-time and can therefore have complex computation. This estimation is used in the second stage to train a small but fast model to classify without the need to search for semantic connections between the different image pixels. Both methods are combined for a semantically correct, stable, and fast approach.
Bio: Noah is an Undergrad Student at University of Applied Science Mittweida, studying Digital Forensics. After many years of programming, his focus shifted to ML. Since early 2019, he researches real time video segmentation at Cinector.