Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

Résumé

Applying a stochastic gradient descent method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values it can make sense to approximate the discrete-time process with a continuous-time diffusion. We refine some results on the weak error of diffusion approximations. In particular, we explicitly compute the leading term in the error expansion of an ODE approximation with respect to a parameter h discretizing the learning rate schedule. The leading term changes if one extends the ODE with a Brownian diffusion component. Finally, we show that if the learning rate is time varying, then its rate of change needs to enter the drift coefficient in order to obtain an approximation of order 2.
Fichier principal
Vignette du fichier
DiffApproxSGD_v2.pdf (543.32 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03262396 , version 1 (16-06-2021)
hal-03262396 , version 2 (07-10-2021)
hal-03262396 , version 3 (27-02-2023)

Identifiants

  • HAL Id : hal-03262396 , version 2

Citer

Stefan Ankirchner, Stefan Perko. Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules. 2021. ⟨hal-03262396v2⟩
364 Consultations
420 Téléchargements

Partager

Gmail Facebook X LinkedIn More