Skip to content

DataTalks HFA #17: Efficiency and Generalization in Modern Neural Networks

Photo of Bar Eini Porat
Hosted By
Bar Eini P. and Tom B.
DataTalks HFA #17: Efficiency and Generalization in Modern Neural Networks

Details

📢 Dive into the enigmatic world of modern neural networks at our upcoming Datahack Haifa meetup! Hosted by our great friends at Gav-Yam MATAM, this event features talks by leading experts Prof. Daniel Soudry and Dr. Brian Chmiel. They will share their research and practical insights in two areas: efficiency - how to reduce computational demands while optimizing neural networks, and generalization—how deep networks manage to perform well on unseen data despite their complexity.

  • 15:30 MPB (mingling, pizza, beer)
  • 16:00 Prof. Daniel Soudry
  • 16:45 Dr. Brian Chmiel

♦ Location: Tech&Talk MATAM
♦ Language: The event will be held in Hebrew
♦ Background: Basic knowledge in data science and machine learning is advised
------------------------------------------------------------------------------------------------
Abstracts:

🚀 Why do typical deep networks generalize well?
Deep neural networks models keep growing larger. However, it is not clear how such large ("over-parameterized") models can work at all, as classical ('worst case') theory tells us such models should overfit and not generalize well to unseen data. Interestingly, we prove that random models that fit the data typically generalize well. This remains true even if they are over-parameterized, as long as the labels are generated using a "narrow teacher".

Daniel Soudry is an associate professor and Schmidt Career Advancement Chair in AI in the Electrical and Computer Engineering Department at the Technion, working in the areas of machine learning and neural networks. His recent works focus on resource efficiency and implicit bias in neural networks. He is a member of Israel’s Young Academy, and the recipient of the Gruss Lipper fellowship, the Goldberg Award, the ERC starting grant, and Intel's Rising Star Faculty Award.

🤖 Overcoming 4 bit quantization challenges in Large Language Models training.
In recent years, natural language processing (NLP) has been transformed by large language models (LLMs), which excel in contextual understanding and reasoning as their size increases. However, the growth of these models comes with significant computational demands, making training and inference highly resource-intensive. To address this, quantization has emerged as a key technique, reducing memory usage by lowering the bit widths of model components without sacrificing performance. Despite its potential, standard 4-bit quantization formats have struggled with LLM training due to the long-tail distribution of neural gradients and the need for unbiased gradients for effective stochastic gradient descent (SGD). To overcome this, we propose a modified 4-bit format that addresses these challenges, enabling successful LLM training with comparable results to higher-bit representations.

Brian is a researcher at Habana Labs-Intel, specializing in deep learning and optimization. He has authored several papers in top conferences like NeurIPS, ICLR, and ICML, garnering hundreds of citations. Brian earned his PhD from the Technion, where he was supervised by Prof. Daniel Soudry and Prof. Alex Bronstein

Photo of Datahack Haifa group
Datahack Haifa
See more events
Tech&Talk MATAM
· Haifa