DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

Name: DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠
Start: 2020-08-11T18:00:00+03:00
End: 2020-08-11T20:00:00+03:00

Hosted By

Shay Palachy A.

Details

DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

Our 32nd DataTalks meetup will be held online and will focus on optimizing and distilling neural networks.

𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝘂𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗹𝗶𝗻𝗸 𝗶𝘀 𝗺𝗮𝗻𝗱𝗮𝘁𝗼𝗿𝘆!
𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗹𝗶𝗻𝗸: https://floor28.co.il/event/f00dbba5-7f1a-4588-8fe4-a221e41b363a

𝗔𝗴𝗲𝗻𝗱𝗮:
🔷 18:00 - Opening words
🔶 18:10 - 19:00 – schuBERT: Optimizing Elements of BERT – Zohar Karnin, Principal Applied Scientist at AWS
🔴 19:00 - 19:50 – GAN Distillation with AutoGAN-Distiller – Yoav Ramon, ML Engineer at Hi Auto

---------------------

𝘀𝗰𝗵𝘂𝗕𝗘𝗥𝗧: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗘𝗹𝗲𝗺𝗲𝗻𝘁𝘀 𝗼𝗳 𝗕𝗘𝗥𝗧 – 𝗭𝗼𝗵𝗮𝗿 𝗞𝗮𝗿𝗻𝗶𝗻, 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗔𝗽𝗽𝗹𝗶𝗲𝗱 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁 𝗮𝘁 𝗔𝗪𝗦

Transformers have gradually become a key component for many state-of-the-art natural language representation models. The recent transformer based model BERT, achieved state-of-the-art results on various natural language processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0. This model however is computationally prohibitive and has a huge number of parameters.

In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such FLOPs or latency.

We show that much efficient light models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than the common choice reducing the number of Transformer encoder layers. In particular, our methods uncovers the usefulness of a non-standard design choice for multi-head attention layers making them much more efficient. By applying our findings, our schuBERT gives 6.6% higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

𝗣𝗮𝗽𝗲𝗿 𝗹𝗶𝗻𝗸: https://www.aclweb.org/anthology/2020.acl-main.250.pdf

𝗕𝗶𝗼: Zohar Karnin received his Ph.D in computer science from the Technion, Israel Institute of Technology at 2011. His research interests are in the area of large scale and online machine learning algorithms. He is currently a Principal Scientist in Amazon AWS AI leading the science for multiple efforts in SageMaker, an environment for machine learning development.

---------------------

𝗚𝗔𝗡 𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗔𝘂𝘁𝗼𝗚𝗔𝗡-𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗲𝗿 – 𝗬𝗼𝗮𝘃 𝗥𝗮𝗺𝗼𝗻, 𝗠𝗟 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗮𝘁 𝗛𝗶 𝗔𝘂𝘁𝗼

GANS can get extremely big and get up to 1200 GFLOPS (One billion floating-point operations). For reference, MobileNET contains 0.5 GFLOPS.

This is why in many cases we want to lower the number of parameters of our GANs in order to save costs when running on cloud or being able to run those networks on edge devices. The problem is that classical methods, like pruning or model-distillation, that work well with other networks don't work well with GANs. AutoGAN-Distiller (Yonggan Fu et al.) is the first time that a practical way to lower the number of parameters this GAN, and is doing that with constrained Auto-ML techniques.

In my lecure I will talk about this research and also tell about a project I did that involved distilling Mel-GAN, a vocoder that is being used for real-time Text-To-Speech generation.

𝗣𝗮𝗽𝗲𝗿 𝗹𝗶𝗻𝗸: https://arxiv.org/pdf/2006.08198v1.pdf

𝗥𝗲𝗽𝗼: https://github.com/TAMU-VITA/AGD

𝗕𝗶𝗼: Yoav Ramon is an ML Engineer and first worker at Hi Auto, A newly founded startup.

---------------------

Events in