Learning Deep Hierarchical Visual Feature Coding

Hanlin Goh Nicolas Thome 1 Matthieu Cord 1 Joo-Hwee Lim
1 MLIA - Machine Learning and Information Access
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
Type de document :
Article dans une revue
IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2014, 25 (12), pp.2212-2225. 〈10.1109/TNNLS.2014.2307532〉
Liste complète des métadonnées

Contributeur : Lip6 Publications <>
Soumis le : jeudi 20 août 2015 - 11:35:15
Dernière modification le : samedi 8 décembre 2018 - 01:27:59

Lien texte intégral




Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Learning Deep Hierarchical Visual Feature Coding. IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2014, 25 (12), pp.2212-2225. 〈10.1109/TNNLS.2014.2307532〉. 〈hal-01185465〉



Consultations de la notice