An Inertial Newton Algorithm for Deep Learning

Abstract : We introduce a new second-order inertial method for machine learning called INDIAN, exploiting the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes the method fully implementable and adapted to large-scale optimization problems such as the training of a deep neural network. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INDIAN to critical points for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning losses involving tame optimization in which we study the continuous dynamical system together with the discrete stochastic approximations. On the theoretical side, we also prove a sublinear convergence rate for the continuous time differential inclusion which underlies the algorithm. From an empirical point of view the algorithm shows promising results on popular DNN training benchmark problems.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02140748
Contributor : Camille Castera <>
Submitted on : Thursday, December 12, 2019 - 2:00:08 AM
Last modification on : Monday, January 13, 2020 - 1:12:22 AM

Files

arXiv.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02140748, version 3
  • ARXIV : 1905.12278

Citation

Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels. An Inertial Newton Algorithm for Deep Learning. 2019. ⟨hal-02140748v3⟩

Share

Metrics

Record views

34

Files downloads

15