Skip to content

Details

DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller āœ”ļøšŸ§ 

Our 32nd DataTalks meetup will be held online and will focus on optimizing and distilling neural networks.

š—„š—²š—“š—¶š˜€š˜š—æš—®š˜š—¶š—¼š—» š˜‚š˜€š—¶š—»š—“ š˜š—µš—² š—¹š—¶š—»š—ø š—¶š˜€ š—ŗš—®š—»š—±š—®š˜š—¼š—æš˜†!
š—„š—²š—“š—¶š˜€š˜š—æš—®š˜š—¶š—¼š—» š—¹š—¶š—»š—ø: https://floor28.co.il/event/f00dbba5-7f1a-4588-8fe4-a221e41b363a

š—”š—“š—²š—»š—±š—®:
šŸ”· 18:00 - Opening words
šŸ”¶ 18:10 - 19:00 – schuBERT: Optimizing Elements of BERT – Zohar Karnin, Principal Applied Scientist at AWS
šŸ”“ 19:00 - 19:50 – GAN Distillation with AutoGAN-Distiller – Yoav Ramon, ML Engineer at Hi Auto

---------------------

š˜€š—°š—µš˜‚š—•š—˜š—„š—§: š—¢š—½š˜š—¶š—ŗš—¶š˜‡š—¶š—»š—“ š—˜š—¹š—²š—ŗš—²š—»š˜š˜€ š—¼š—³ š—•š—˜š—„š—§ – š—­š—¼š—µš—®š—æ š—žš—®š—æš—»š—¶š—», š—£š—æš—¶š—»š—°š—¶š—½š—®š—¹ š—”š—½š—½š—¹š—¶š—²š—± š—¦š—°š—¶š—²š—»š˜š—¶š˜€š˜ š—®š˜ š—”š—Ŗš—¦

Transformers have gradually become a key component for many state-of-the-art natural language representation models. The recent transformer based model BERT, achieved state-of-the-art results on various natural language processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0. This model however is computationally prohibitive and has a huge number of parameters.

In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such FLOPs or latency.

We show that much efficient light models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than the common choice reducing the number of Transformer encoder layers. In particular, our methods uncovers the usefulness of a non-standard design choice for multi-head attention layers making them much more efficient. By applying our findings, our schuBERT gives 6.6% higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

š—£š—®š—½š—²š—æ š—¹š—¶š—»š—ø: https://www.aclweb.org/anthology/2020.acl-main.250.pdf

š—•š—¶š—¼: Zohar Karnin received his Ph.D in computer science from the Technion, Israel Institute of Technology at 2011. His research interests are in the area of large scale and online machine learning algorithms. He is currently a Principal Scientist in Amazon AWS AI leading the science for multiple efforts in SageMaker, an environment for machine learning development.

---------------------

š—šš—”š—” š——š—¶š˜€š˜š—¶š—¹š—¹š—®š˜š—¶š—¼š—» š˜„š—¶š˜š—µ š—”š˜‚š˜š—¼š—šš—”š—”-š——š—¶š˜€š˜š—¶š—¹š—¹š—²š—æ – š—¬š—¼š—®š˜ƒ š—„š—®š—ŗš—¼š—», š— š—Ÿ š—˜š—»š—“š—¶š—»š—²š—²š—æ š—®š˜ š—›š—¶ š—”š˜‚š˜š—¼

GANS can get extremely big and get up to 1200 GFLOPS (One billion floating-point operations). For reference, MobileNET contains 0.5 GFLOPS.

This is why in many cases we want to lower the number of parameters of our GANs in order to save costs when running on cloud or being able to run those networks on edge devices. The problem is that classical methods, like pruning or model-distillation, that work well with other networks don't work well with GANs. AutoGAN-Distiller (Yonggan Fu et al.) is the first time that a practical way to lower the number of parameters this GAN, and is doing that with constrained Auto-ML techniques.

In my lecure I will talk about this research and also tell about a project I did that involved distilling Mel-GAN, a vocoder that is being used for real-time Text-To-Speech generation.

š—£š—®š—½š—²š—æ š—¹š—¶š—»š—ø: https://arxiv.org/pdf/2006.08198v1.pdf

š—„š—²š—½š—¼: https://github.com/TAMU-VITA/AGD

š—•š—¶š—¼: Yoav Ramon is an ML Engineer and first worker at Hi Auto, A newly founded startup.

---------------------

š—„š—²š—“š—¶š˜€š˜š—æš—®š˜š—¶š—¼š—» š˜‚š˜€š—¶š—»š—“ š˜š—µš—² š—¹š—¶š—»š—ø š—¶š˜€ š—ŗš—®š—»š—±š—®š˜š—¼š—æš˜†!
š—„š—²š—“š—¶š˜€š˜š—æš—®š˜š—¶š—¼š—» š—¹š—¶š—»š—ø: https://floor28.co.il/event/f00dbba5-7f1a-4588-8fe4-a221e41b363a

Members are also interested in