Large-time asymptotics in deep learning

Carlos Esteve-Yagüe; Borjan Geshkovski; Dario Pighin; Enrique Zuazua

Pré-Publication, Document De Travail Année : 2021

Large-time asymptotics in deep learning

, (1) , (2) , (3)

1
2
3

Carlos Esteve-Yagüe

Fonction : Auteur

Borjan Geshkovski

Fonction : Auteur
PersonId : 1053249
ORCID : 0000-0002-7890-3352

Universidad Autónoma de Madrid

Dario Pighin

Fonction : Auteur
PersonId : 1023307

Departemento de Matematicas

Enrique Zuazua

Fonction : Auteur
PersonId : 10801
IdHAL : enrique-zuazua
ORCID : 0000-0002-1377-0958
IdRef : 033333874

Sorbonne Université

Résumé

It is by now well-known that practical deep supervised learning may roughly be cast as an optimal control problem for a specific discrete-time, nonlinear dynamical system called an artificial neural network. In this work, we consider the continuous-time formulation of the deep supervised learning problem, and study the latter's behavior when the final time horizon increases, a fact that can be interpreted as increasing the number of layers in the neural network setting. When considering the classical regularized empirical risk minimization problem , we show that, in long time, the optimal states converge to zero training error, namely approach the zero training error regime, whilst the optimal control parameters approach, on an appropriate scale, minimal norm parameters with corresponding states precisely in the zero training error regime. This result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective. We also propose a learning problem consisting of minimizing a cost with a state tracking term, and establish the well-known turnpike property, which indicates that the solutions of the learning problem in long time intervals consist of three pieces, the first and the last of which being transient short-time arcs, and the middle piece being a long-time arc staying exponentially close to the optimal solution of an associated static learning problem. This property in fact stipulates a quantitative estimate for the number of layers required to reach the zero training error regime. Both of the aforementioned asymptotic regimes are addressed in the context of continuous-time and continuous space-time neural networks, the latter taking the form of nonlinear, integro-differential equations, hence covering residual neural networks with both fixed and possibly variable depths.

Domaines

Optimisation et contrôle [math.OC] Apprentissage [cs.LG]

Fichier principal

manuscript.pdf (5.63 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Borjan Geshkovski : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02912516

Soumis le : lundi 29 mars 2021-22:40:09

Dernière modification le : vendredi 5 avril 2024-11:24:26

Dates et versions

hal-02912516 , version 1 (06-08-2020)

hal-02912516 , version 2 (29-03-2021)

Identifiants

HAL Id : hal-02912516 , version 2

Citer

Carlos Esteve-Yagüe, Borjan Geshkovski, Dario Pighin, Enrique Zuazua. Large-time asymptotics in deep learning. 2021. ⟨hal-02912516v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

TDS-MACS SORBONNE-UNIVERSITE

190 Consultations

45 Téléchargements

Large-time asymptotics in deep learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager