Skip to Main content Skip to Navigation
Journal articles

Learning Deep Hierarchical Visual Feature Coding

Abstract : In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
Document type :
Journal articles
Complete list of metadatas
Contributor : Lip6 Publications <>
Submitted on : Thursday, August 20, 2015 - 11:35:15 AM
Last modification on : Thursday, January 23, 2020 - 5:12:04 PM

Links full text



Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Learning Deep Hierarchical Visual Feature Coding. IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2014, 25 (12), pp.2212-2225. ⟨10.1109/TNNLS.2014.2307532⟩. ⟨hal-01185465⟩



Record views