# Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

Abstract : Applying a stochastic gradient descent method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values it can make sense to approximate the discrete-time process with a continuous-time diffusion. We refine some results on the weak error of diffusion approximations. In particular, we explicitly compute the leading term in the error expansion of an ODE approximation with respect to a parameter h discretizing the learning rate schedule. The leading term changes if one extends the ODE with a Brownian diffusion component. Finally, we show that if the learning rate is time varying, then its rate of change needs to enter the drift coefficient in order to obtain an approximation of order 2.
Document type :
Preprints, Working Papers, ...
Domain :

https://hal.archives-ouvertes.fr/hal-03262396
Contributor : Stefan Ankirchner Connect in order to contact the contributor
Submitted on : Thursday, October 7, 2021 - 3:44:15 PM
Last modification on : Tuesday, October 12, 2021 - 3:04:21 AM

### File

DiffApproxSGD_v2.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-03262396, version 2

### Citation

Stefan Ankirchner, Stefan Perko. Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules. 2021. ⟨hal-03262396v2⟩

Record views