Unsupervised and supervised visual codes with restricted Boltzmann machines

Abstract : Recently, the coding of local features (e.g. SIFT) for image categorization tasks has been extensively studied. Incorporated within the Bag of Words (BoW) framework, these techniques optimize the projection of local features into the visual codebook, leading to state-of-the-art performances in many benchmark datasets. In this work, we propose a novel visual codebook learning approach using the restricted Boltzmann machine (RBM) as our generative model. Our contribution is three-fold. Firstly, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each feature's representation as well as the selectivity for each codeword. The codewords are then fine-tuned to be discriminative through the supervised learning from top-down labels. Secondly, we evaluate the proposed method with the Caltech-101 and 15-Scenes datasets, either matching or outperforming state-of-the-art results. The codebooks are compact and inference is fast. Finally, we introduce an original method to visualize the codebooks and decipher what each visual codeword encodes.
Document type :
Conference papers
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00816428
Contributor : Hanlin Goh <>
Submitted on : Monday, April 22, 2013 - 11:31:37 AM
Last modification on : Thursday, March 21, 2019 - 2:22:00 PM
Long-term archiving on : Tuesday, July 23, 2013 - 4:12:41 AM

File

12_ECCV.pdf
Files produced by the author(s)

Identifiers

Citation

Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Unsupervised and supervised visual codes with restricted Boltzmann machines. 12th European conference on Computer Vision, Oct 2012, Florence, Italy. pp.298-311, ⟨10.1007/978-3-642-33715-4_22⟩. ⟨hal-00816428⟩

Share

Metrics

Record views

461

Files downloads

2176