A comparison of continuous-time approximations to stochastic gradient descent

Stefan Ankirchner; Stefan Perko

Pré-Publication, Document De Travail Année : 2023

A comparison of continuous-time approximations to stochastic gradient descent

(1) , (1)

Stefan Ankirchner

Fonction : Auteur
PersonId : 970773

Friedrich-Schiller-Universität = Friedrich Schiller University Jena [Jena, Germany]

Stefan Perko

Fonction : Auteur
PersonId : 1102258

Friedrich-Schiller-Universität = Friedrich Schiller University Jena [Jena, Germany]

Résumé

Applying a stochastic gradient descent (SGD) method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values, many authors have considered continuous-time approximations of SGD. We refine existing results on the weak error of first-order ODE and SDE approximations to SGD for non-infinitesimal learning rates. In particular, we explicitly compute the leading term in the error expansion of gradient flow and two of its stochastic counterparts, with respect to a discretization parameter h. In the example of linear regression, we demonstrate the general inferiority of the deterministic gradient flow approximation in comparison to the stochastic ones. Further, we demonstrate that for Gaussian features both SDE approximations are equally good. However, for leptokurtic features we find that the SDE approximation with state-dependent diffusion coefficient is of higher quality than the approximation with state-independent noise. Moreover, the relationship reverses for platykurtic features.

Mots clés

Stochastic gradient descent, gradient flow, stochastic differential equation, weak approximation, learning rate schedules, Talay-Tubaro expansion

Domaines

Probabilités [math.PR] Apprentissage [cs.LG]

Fichier principal

comparison_continuous_time_SGD_HAL.pdf (682.73 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stefan Ankirchner : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03262396

Soumis le : lundi 27 février 2023-14:27:11

Dernière modification le : lundi 8 avril 2024-10:14:15

Dates et versions

hal-03262396 , version 1 (16-06-2021)

hal-03262396 , version 2 (07-10-2021)

hal-03262396 , version 3 (27-02-2023)

Identifiants

HAL Id : hal-03262396 , version 3

Citer

Stefan Ankirchner, Stefan Perko. A comparison of continuous-time approximations to stochastic gradient descent. 2023. ⟨hal-03262396v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

369 Consultations

430 Téléchargements

A comparison of continuous-time approximations to stochastic gradient descent

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager