DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

Name: DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠
Start: 2020-08-11T18:00:00+03:00
End: 2020-08-11T20:00:00+03:00

Hosted by Shay Palachy A.

Meet the group

DataHack - Data Science, Machine Learning & Statistics

No reviews yet

Details

DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

Our 32nd DataTalks meetup will be held online and will focus on optimizing and distilling neural networks.

𝗭𝗼𝗼𝗺 𝗹𝗶𝗻𝗸: https://us02web.zoom.us/j/89415600010?pwd=bEpoOHIwV3pRbEFWL0RrT0NkR2dSUT09

𝗔𝗴𝗲𝗻𝗱𝗮:
🔷 18:00 - Opening words
🔶 18:10 - 19:00 – schuBERT: Optimizing Elements of BERT – Zohar Karnin, Principal Applied Scientist at AWS
🔴 19:00 - 19:50 – GAN Distillation with AutoGAN-Distiller – Yoav Ramon, ML Engineer at Hi Auto

---------------------

𝘀𝗰𝗵𝘂𝗕𝗘𝗥𝗧: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗘𝗹𝗲𝗺𝗲𝗻𝘁𝘀 𝗼𝗳 𝗕𝗘𝗥𝗧 – 𝗭𝗼𝗵𝗮𝗿 𝗞𝗮𝗿𝗻𝗶𝗻, 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗔𝗽𝗽𝗹𝗶𝗲𝗱 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁 𝗮𝘁 𝗔𝗪𝗦

Transformers (Vaswani et al., 2017) have gradually become a key component for many state-of-the-art natural-language-representation models. A recent Transformer-based model — BERT (Devlin et al., 2018) — achieved state-of-the-art results on various natural-language-processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0.

This model however is computationally prohibitive and has a huge number of parameters. In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such as FLOPs or latency.

We show that much more efficient light BERT models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than reducing the number of Transformer encoder layers. In particular, our schuBERT gives 6.6% higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

𝗣𝗮𝗽𝗲𝗿 𝗹𝗶𝗻𝗸: https://www.amazon.science/publications/schubert-optimizing-elements-of-bert

𝗕𝗶𝗼: Zohar is a Principal Research Scientist at Amazon.

---------------------

𝗚𝗔𝗡 𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗔𝘂𝘁𝗼𝗚𝗔𝗡-𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗲𝗿 – 𝗬𝗼𝗮𝘃 𝗥𝗮𝗺𝗼𝗻, 𝗠𝗟 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗮𝘁 𝗛𝗶 𝗔𝘂𝘁𝗼

GANS can get extremely big and get up to 1200 GFLOPS (One billion floating-point operations). For reference, MobileNET contains 0.5 GFLOPS.

This is why in many cases we want to lower the number of parameters of our GANs in order to save costs when running on cloud or being able to run those networks on edge devices. The problem is that classical methods, like pruning or model-distillation, that work well with other networks don't work well with GANs. AutoGAN-Distiller (Yonggan Fu et al.) is the first time that a practical way to lower the number of parameters this GAN, and is doing that with constrained Auto-ML techniques.

In my lecure I will talk about this research and also tell about a project I did that involved distilling Mel-GAN, a vocoder that is being used for real-time Text-To-Speech generation.

𝗣𝗮𝗽𝗲𝗿 𝗹𝗶𝗻𝗸: https://arxiv.org/pdf/2006.08198v1.pdf

𝗥𝗲𝗽𝗼: https://github.com/TAMU-VITA/AGD

𝗕𝗶𝗼: Yoav Ramon is an ML Engineer and first worker at Hi Auto, A newly founded startup.

---------------------

𝗭𝗼𝗼𝗺 𝗹𝗶𝗻𝗸: https://us02web.zoom.us/j/89415600010?pwd=bEpoOHIwV3pRbEFWL0RrT0NkR2dSUT09

This event was canceled

DataHack - Data Science, Machine Learning & Statistics

DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

DataHack - Data Science, Machine Learning & Statistics

Details

DataTalks #32: Optimizing Elements of BERT & AutoGAN-Distiller ✔️🧠

Members are also interested in