Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Skander Karkar; Ibrahim Ayed; Emmanuel de Bezenac; Patrick Gallinari

Communication Dans Un Congrès Année : 2022

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

(1, 2) , (3, 2) , (2) , (1, 2)

1
2
3

Skander Karkar

Fonction : Auteur
PersonId : 1084202
IdHAL : skander-karkar
IdRef : 273263056

Criteo AI Lab

Machine Learning and Information Access

Ibrahim Ayed

Fonction : Auteur

ThereSIS lab - Thales

Machine Learning and Information Access

Emmanuel de Bezenac

Fonction : Auteur
PersonId : 1101543
IdRef : 259426148

Machine Learning and Information Access

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Criteo AI Lab

Machine Learning and Information Access

Résumé

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

blockwise.pdf (901.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

skander karkar : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04108676

Soumis le : dimanche 28 mai 2023-04:45:27

Dernière modification le : mardi 5 décembre 2023-11:07:47

Archivage à long terme le : mardi 29 août 2023-18:01:55

Dates et versions

hal-04108676 , version 1 (28-05-2023)

Identifiants

HAL Id : hal-04108676 , version 1

Citer

Skander Karkar, Ibrahim Ayed, Emmanuel de Bezenac, Patrick Gallinari. Block-wise Training of Residual Networks via the Minimizing Movement Scheme. 1st International Workshop on Practical Deep Learning in the Wild at 26th AAAI Conference on Artificial Intelligence 2022, AAAI, Feb 2022, Vancouver, Canada. ⟨hal-04108676⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

32 Consultations

28 Téléchargements

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager