Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

Stefan Ankirchner; Stefan Perko

Pré-Publication, Document De Travail Année : 2021

Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

(1) , (1)

Stefan Ankirchner

Fonction : Auteur
PersonId : 970773

Friedrich-Schiller-Universität = Friedrich Schiller University Jena [Jena, Germany]

Stefan Perko

Fonction : Auteur
PersonId : 1102258

Friedrich-Schiller-Universität = Friedrich Schiller University Jena [Jena, Germany]

Résumé

Applying a stochastic gradient descent method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values it can make sense to approximate the discrete-time process with a continuous-time diffusion. We refine some results on the weak error of diffusion approximations. In particular, we explicitly compute the leading term in the error expansion of an ODE approximation with respect to a parameter h discretizing the learning rate schedule. The leading term changes if one extends the ODE with a Brownian diffusion component. Finally, we show that if the learning rate is time varying, then its rate of change needs to enter the drift coefficient in order to obtain an approximation of order 2.

Domaines

Probabilités [math.PR]

Fichier principal

DiffApproxSGD_v2.pdf (543.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stefan Ankirchner : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03262396

Soumis le : jeudi 7 octobre 2021-15:44:15

Dernière modification le : vendredi 17 mars 2023-03:24:47

Dates et versions

hal-03262396 , version 1 (16-06-2021)

hal-03262396 , version 2 (07-10-2021)

hal-03262396 , version 3 (27-02-2023)

Identifiants

HAL Id : hal-03262396 , version 2

Citer

Stefan Ankirchner, Stefan Perko. Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules. 2021. ⟨hal-03262396v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

364 Consultations

420 Téléchargements

Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager