Unsupervised Pre-Training of Image Features on Non-Curated Data

Mathilde Caron; Piotr Bojanowski; Julien Mairal; Armand Joulin

doi:10.1109/ICCV.2019.00305

Communication Dans Un Congrès Année : 2019

Unsupervised Pre-Training of Image Features on Non-Curated Data

(1, 2) , (1) , (3, 2) , (1)

1
2
3

Mathilde Caron

Fonction : Auteur
PersonId : 1046708

Facebook AI Research [Paris]

Apprentissage de modèles à partir de données massives

Piotr Bojanowski

Fonction : Auteur
PersonId : 948453

Facebook AI Research [Paris]

Julien Mairal

Fonction : Auteur
PersonId : 1034832
ORCID : 0000-0001-6991-2110
IdRef : 152125256

Department of Statistics [Berkeley]

Apprentissage de modèles à partir de données massives

Armand Joulin

Fonction : Auteur
PersonId : 915272

Facebook AI Research [Paris]

Résumé

Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using non-curated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M [42], achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only non-curated raw data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.9% top-1 classification accuracy on the validation set of ImageNet, which is an improvement of +0.8% over the same network trained from scratch. Our code is available at https://github.com/facebookresearch/DeeperCluster.

Domaines

Informatique [cs] Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG]

Fichier principal

main.pdf (2.22 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Mathilde Caron : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02119564

Soumis le : lundi 9 septembre 2019-11:52:36

Dernière modification le : jeudi 4 avril 2024-21:40:22

Dates et versions

hal-02119564 , version 1 (03-05-2019)

hal-02119564 , version 2 (09-09-2019)

Identifiants

HAL Id : hal-02119564 , version 2
DOI : 10.1109/ICCV.2019.00305

Citer

Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin. Unsupervised Pre-Training of Image Features on Non-Curated Data. ICCV 2019 - International Conference on Computer Vision, Oct 2019, Seoul, South Korea. pp.2959-2968, ⟨10.1109/ICCV.2019.00305⟩. ⟨hal-02119564v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI INRIA2 LJK-GI-THOTH UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

256 Consultations

930 Téléchargements

Unsupervised Pre-Training of Image Features on Non-Curated Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager