Learning Deep Hierarchical Visual Feature Coding

Abstract : In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
Document type :
Journal articles
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01185465
Contributor : Lip6 Publications <>
Submitted on : Thursday, August 20, 2015 - 11:35:15 AM
Last modification on : Saturday, December 8, 2018 - 1:27:59 AM

Links full text

Identifiers

Collections

Citation

Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Learning Deep Hierarchical Visual Feature Coding. IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2014, 25 (12), pp.2212-2225. 〈10.1109/TNNLS.2014.2307532〉. 〈hal-01185465〉

Share

Metrics

Record views

719