Skip to Main content Skip to Navigation
New interface
Preprints, Working Papers, ...

Adaptive scaling of the learning rate by second order automatic differentiation

Abstract : In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The numerical experiments highlight the different exploration/convergence regimes.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03748574
Contributor : Alban Gossard Connect in order to contact the contributor
Submitted on : Tuesday, October 25, 2022 - 5:55:30 PM
Last modification on : Thursday, October 27, 2022 - 4:11:36 AM

Files

Adaptive scaling of the learni...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03748574, version 2
  • ARXIV : 2210.14520

Citation

Frédéric de Gournay, Alban Gossard. Adaptive scaling of the learning rate by second order automatic differentiation. 2022. ⟨hal-03748574v2⟩

Share

Metrics

Record views

39

Files downloads

1