Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Eugene Belilovsky; Louis Leconte; Lucas Caccia; Michael Eickenberg; Edouard Oyallon

Pré-Publication, Document De Travail Année : 2021

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

(1, 2) , (3, 4) , (1, 5) , (6) , (3)

1
2
3
4
5
6

Eugene Belilovsky

Fonction : Auteur
PersonId : 1100884

Montreal Institute for Learning Algorithms [Montréal]

Concordia University [Montreal]

Louis Leconte

Fonction : Auteur
PersonId : 1100885

Machine Learning and Information Access

Centre de Mathématiques Appliquées - Ecole Polytechnique

Lucas Caccia

Fonction : Auteur
PersonId : 1100886

Montreal Institute for Learning Algorithms [Montréal]

McGill University = Université McGill [Montréal, Canada]

Michael Eickenberg

Fonction : Auteur
PersonId : 1100887

Flatiron Institute

Edouard Oyallon

Fonction : Auteur
PersonId : 179157
IdHAL : edouard-oyallon
ORCID : 0000-0002-4826-7527
IdRef : 228745500

Machine Learning and Information Access

Résumé

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a classic greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization. With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays. To address bandwidth and memory issues we propose an approach based on online vector quantization. This allows to drastically reduce the communication bandwidth between modules and required memory for replay buffers. We show theoretically and empirically that this approach converges and compare it to the sequential solvers. We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale ImageNet dataset.

Mots clés

Greedy Learning Asynchronous Distributed Optimization Decoupled Optimization Compression for Optimization

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI]

Fichier principal

example_paper.pdf (1.31 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Edouard Oyallon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03247753

Soumis le : jeudi 10 juin 2021-09:34:23

Dernière modification le : vendredi 3 mai 2024-13:43:44

Archivage à long terme le : samedi 11 septembre 2021-18:03:26

Dates et versions

hal-03247753 , version 1 (10-06-2021)

Identifiants

HAL Id : hal-03247753 , version 1
ARXIV : 2106.06401

Citer

Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon. Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning. 2021. ⟨hal-03247753⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INSMI X-CMAP X-DEP-MATHA CMAP LIP6 GENCI SORBONNE-UNIVERSITE SU-SCIENCES IP_PARIS

97 Consultations

93 Téléchargements

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager