Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Umut Şimşekli; Lingjiong Zhu; Yee Whye Teh; Mert Gürbüzbalaban

Communication Dans Un Congrès Année : 2020

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

(1) , (2, 3) , ,

1
2
3

Umut Şimşekli

Fonction : Auteur
PersonId : 6757
IdHAL : umut-simsekli
IdRef : 250884003

Institut Polytechnique de Paris

Lingjiong Zhu

Fonction : Auteur

Département Images, Données, Signal

Signal, Statistique et Apprentissage

Yee Whye Teh

Fonction : Auteur

Mert Gürbüzbalaban

Fonction : Auteur

Résumé

Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a \emph{continuous-time} variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that \emph{the optima of stationary distribution} of the dynamics might not match \emph{the optima of the cost function to be optimized}. As a remedy, we develop a novel framework, which we coin as \emph{fractional} ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We observe that the Euler discretization of FULD has noteworthy algorithmic similarities with \emph{natural gradient} methods and \emph{gradient clipping}, bringing a new perspective on understanding their role in deep learning. We support our theory with experiments conducted on a synthetic model and neural networks.

Domaines

Machine Learning [stat.ML]

Umut Şimşekli : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03269112

Soumis le : mercredi 23 juin 2021-16:59:58

Dernière modification le : lundi 9 octobre 2023-12:49:43

Dates et versions

hal-03269112 , version 1 (23-06-2021)

Identifiants

HAL Id : hal-03269112 , version 1
ARXIV : 2002.05685

Citer

Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban. Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise. International Conference on Machine Learning, 2020, Online, France. ⟨hal-03269112⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS ANR

16 Consultations

0 Téléchargements

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager