Deep neural networks have grown to be the most widely adopted models to solve most computer vision and natural language processing tasks. Since the renewed interest, sparked in 2012, for these architectures, in machine learning, their size in terms of memory footprint and computational costs have increased tremendously, which has hindered their deployment. In response, the deep learning community has researched for methods to compress and accelerate these models. These methods are: efficient architecture design, tensor decomposition, pruning and quantization. In this presentation, I paint a landscape of the current state-of-the art in deep neural networks compression and acceleration, as well as my contributions to the field. In particular, in SInGE, I proposed a new importance-based criterion for data-driven pruning. This criterion was inspired by attribution techniques, which consist in ranking inputs by their relative importance with respect to the final prediction. In SInGE, I adapted one of the most effective attribution technique to weight importance ranking for pruning. Regarding quantization, I will go over the technical details behind the three major components: mixed-precision, non-uniform representations and training that enable, on device, gen AI inference.
Reference: https://proceedings.neurips.cc/paper_files/paper/2022/file/e5b0ec2c61957bfd6cf88e118107cc71-Paper-Conference.pdf
bio:
Edouard received his master's degree from Ecole Normale Superieure Paris-Saclay in 2020 and defended a P.h.D. student at ISIR in Sorbonne Université and Datakalab in November 2023. His work led to the acquisition of Datakalab by Apple in December 2023. His research interests includeq but are not limited to DNN solutions for computer vision tasks, compression and acceleration of such models.