Nonlinear Acceleration of Deep Neural Networks

Damien Scieur 1, 2 Edouard Oyallon 3, 4 Alexandre D 'Aspremont 1, 2 Francis Bach 1, 2
1 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : Regularized nonlinear acceleration (RNA) is a generic extrapolation scheme for optimization methods, with marginal computational overhead. It aims to improve convergence using only the iterates of simple iterative algorithms. However, so far its application to optimization was theoretically limited to gradient descent and other single-step algorithms. Here, we adapt RNA to a much broader setting including stochastic gradient with momentum and Nesterov's fast gradient. We use it to train deep neural networks, and empirically observe that extrapolated networks are more accurate, especially in the early iterations. A straightforward application of our algorithm when training ResNet-152 on ImageNet produces a top-1 test error of 20.88%, improving by 0.8% the reference classification pipeline. Furthermore, the code runs offline in this case, so it never negatively affects performance.
Type de document :
Pré-publication, Document de travail
2018
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01799269
Contributeur : Damien Scieur <>
Soumis le : jeudi 24 mai 2018 - 15:01:21
Dernière modification le : samedi 9 juin 2018 - 01:11:45

Fichier

nonlinear_acceleration_of_deep...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01799269, version 1

Citation

Damien Scieur, Edouard Oyallon, Alexandre D 'Aspremont, Francis Bach. Nonlinear Acceleration of Deep Neural Networks. 2018. 〈hal-01799269〉

Partager

Métriques

Consultations de la notice

30

Téléchargements de fichiers

28