Learning Deep Hierarchical Visual Feature Coding

Hanlin Goh; Nicolas Thome; Matthieu Cord; Joo-Hwee Lim

doi:10.1109/TNNLS.2014.2307532

Article Dans Une Revue IEEE Transactions on Neural Networks and Learning Systems Année : 2014

Learning Deep Hierarchical Visual Feature Coding

, (1) , (1) ,

Hanlin Goh

Fonction : Auteur

Nicolas Thome

Fonction : Auteur
PersonId : 181803
IdHAL : nicolas-thome
ORCID : 0000-0003-4871-3045
IdRef : 12878332X

Machine Learning and Information Access

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Joo-Hwee Lim

Fonction : Auteur

Résumé

In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01185465

Soumis le : jeudi 20 août 2015-11:35:15

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01185465 , version 1 (20-08-2015)

Identifiants

HAL Id : hal-01185465 , version 1
DOI : 10.1109/TNNLS.2014.2307532

Citer

Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Learning Deep Hierarchical Visual Feature Coding. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25 (12), pp.2212-2225. ⟨10.1109/TNNLS.2014.2307532⟩. ⟨hal-01185465⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

1429 Consultations

0 Téléchargements

Learning Deep Hierarchical Visual Feature Coding

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager