Adaptive Gradients Methods for Stochastic Optimization

Details
This is the first talk in the Blue Yonder Series on Optimization for Machine Learning. This series is meant for attendees who apply Machine Learning Models and want to know more about the underlying weight optimization algorithms. In particular strength and weaknesses of different methods are compared. Some basic knowledge of Optimization Theory and Algorithms is recommended.
Our first speaker Anton Rodomanov is currently a Postdoctoral Researcher at the CISPA Helmholtz Center for Information Security in Saarbrücken. He attained his PhD at UCLouvain in Louvain-la-Neuve, Belgium, where he worked with his supervisor Yurii Nesterov.
His research Interests and Expertise are settled in Continuous Optimization, with a keen interest in the complexity analysis of optimization algorithms and the development of new, efficient methods. Anton’s focus lies on in the practical applications of optimization, particularly in the field of Machine.
He has an outstanding scientific record including publications on Distributed- and Federated Optimization, Gradient Methods, Quasi-Newton Methods and Nonconvex Optimization.
Abstract of the Talk
The stochastic gradient method (SGD) is a foundational algorithm for solving optimization problems in Machine Learning and beyond. However, its performance heavily depends on selecting appropriate stepsizes, a task that can be challenging and time-consuming.
To address this, adaptive variants of SGD--such as AdaGrad, Adam, and others--have become widely adopted due to their ease of use and efficiency. In this talk, we will explore some of these methods through the lens of modern optimization theory and provide some insights into why adaptive methods often outperform the classical SGD in practice.

Adaptive Gradients Methods for Stochastic Optimization