An Inertial Newton Algorithm for Deep Learning

Camille Castera; Jérôme Bolte; Cédric Févotte; Edouard Pauwels

Article Dans Une Revue Journal of Machine Learning Research Année : 2021

An Inertial Newton Algorithm for Deep Learning

(1, 2) , (3) , (1, 2) , (4)

1
2
3
4

Camille Castera

Fonction : Auteur
PersonId : 175473
IdHAL : camille-castera
ORCID : 0000-0002-7384-6387

Signal et Communications

Centre National de la Recherche Scientifique

Jérôme Bolte

Fonction : Auteur
PersonId : 995617

Toulouse School of Economics

Cédric Févotte

Fonction : Auteur
PersonId : 184864
IdHAL : cedric-fevotte
ORCID : 0000-0003-3801-5534
IdRef : 083298460

Signal et Communications

Centre National de la Recherche Scientifique

Edouard Pauwels

Fonction : Auteur
PersonId : 12830
IdHAL : edouard-pauwels
ORCID : 0000-0002-8180-075X

Argumentation, Décision, Raisonnement, Incertitude et Apprentissage

Résumé

We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard optimization mini-batch methods applied to non-smooth non-convex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INNA. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INNA returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems.

Mots clés

Nonconvex optimization Deep Learning Algorithms Stochastic Optimization Deep learning Second-order methods Dynamical systems Stochastic optimization

Deep Learning Optimisation Stochastique Optimisation non convexe Algorithmes pour le deep learning

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC] Machine Learning [stat.ML]

Fichier principal

arxiv.pdf (1.72 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Camille Castera : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02140748

Soumis le : vendredi 20 août 2021-14:28:19

Dernière modification le : mardi 19 mars 2024-03:10:48

Dates et versions

hal-02140748 , version 1 (27-05-2019)

hal-02140748 , version 2 (06-06-2019)

hal-02140748 , version 3 (12-12-2019)

hal-02140748 , version 4 (12-10-2020)

hal-02140748 , version 5 (02-07-2021)

hal-02140748 , version 6 (20-08-2021)

Identifiants

HAL Id : hal-02140748 , version 6
ARXIV : 1905.12278

Citer

Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels. An Inertial Newton Algorithm for Deep Learning. Journal of Machine Learning Research, 2021, 22 (134). ⟨hal-02140748v6⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS EHESS UT1-CAPITOLE TDS-MACS INRAE IRIT IRIT-SC IRIT-ADRIA ANR ANITI IRIT-SI IRIT-IA CIMI-TOULOUSE IRIT-CNRS IRIT-UT3 INTERACTIFS TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

706 Consultations

346 Téléchargements

An Inertial Newton Algorithm for Deep Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager