Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Adaptive scaling of the learning rate by second order automatic differentiation

Abstract : In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. If (1C, 1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C, 2M) or (2C, 1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameter set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The rescaling is tested using the simple strategy of exponential decay, a method with comprehensive hyperparameters that requires no tuning. When compared to standard algorithm with optimized hyperparameters, this algorithm exhibit similar convergence rates and is also empirically shown to be more stable than standard method.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Alban Gossard Connect in order to contact the contributor
Submitted on : Tuesday, August 9, 2022 - 4:44:46 PM
Last modification on : Thursday, August 11, 2022 - 3:50:47 AM


Adaptive scaling of the learni...
Files produced by the author(s)


  • HAL Id : hal-03748574, version 1


Alban Gossard, Frédéric de Gournay. Adaptive scaling of the learning rate by second order automatic differentiation. 2022. ⟨hal-03748574⟩



Record views


Files downloads