A comparison of continuous-time approximations to stochastic gradient descent - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2023

A comparison of continuous-time approximations to stochastic gradient descent

Résumé

Applying a stochastic gradient descent (SGD) method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values, many authors have considered continuous-time approximations of SGD. We refine existing results on the weak error of first-order ODE and SDE approximations to SGD for non-infinitesimal learning rates. In particular, we explicitly compute the leading term in the error expansion of gradient flow and two of its stochastic counterparts, with respect to a discretization parameter h. In the example of linear regression, we demonstrate the general inferiority of the deterministic gradient flow approximation in comparison to the stochastic ones. Further, we demonstrate that for Gaussian features both SDE approximations are equally good. However, for leptokurtic features we find that the SDE approximation with state-dependent diffusion coefficient is of higher quality than the approximation with state-independent noise. Moreover, the relationship reverses for platykurtic features.
Fichier principal
Vignette du fichier
comparison_continuous_time_SGD_HAL.pdf (682.73 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03262396 , version 1 (16-06-2021)
hal-03262396 , version 2 (07-10-2021)
hal-03262396 , version 3 (27-02-2023)

Identifiants

  • HAL Id : hal-03262396 , version 3

Citer

Stefan Ankirchner, Stefan Perko. A comparison of continuous-time approximations to stochastic gradient descent. 2023. ⟨hal-03262396v3⟩
369 Consultations
430 Téléchargements

Partager

Gmail Facebook X LinkedIn More