Unsupervised and supervised visual codes with restricted Boltzmann machines

Hanlin Goh; Nicolas Thome; Matthieu Cord; Joo-Hwee Lim

doi:10.1007/978-3-642-33715-4_22

Communication Dans Un Congrès Année : 2012

Unsupervised and supervised visual codes with restricted Boltzmann machines

(1, 2, 3) , (3) , (3) , (1, 2)

1
2
3

Hanlin Goh

Fonction : Auteur
PersonId : 927512

Institute for Infocomm Research - I²R [Singapore]

Image & Pervasive Access Lab

Machine Learning and Information Retrieval

Nicolas Thome

Fonction : Auteur
PersonId : 181803
IdHAL : nicolas-thome
ORCID : 0000-0003-4871-3045
IdRef : 12878332X

Machine Learning and Information Retrieval

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Retrieval

Joo-Hwee Lim

Fonction : Auteur

Institute for Infocomm Research - I²R [Singapore]

Image & Pervasive Access Lab

Résumé

Recently, the coding of local features (e.g. SIFT) for image categorization tasks has been extensively studied. Incorporated within the Bag of Words (BoW) framework, these techniques optimize the projection of local features into the visual codebook, leading to state-of-the-art performances in many benchmark datasets. In this work, we propose a novel visual codebook learning approach using the restricted Boltzmann machine (RBM) as our generative model. Our contribution is three-fold. Firstly, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each feature's representation as well as the selectivity for each codeword. The codewords are then fine-tuned to be discriminative through the supervised learning from top-down labels. Secondly, we evaluate the proposed method with the Caltech-101 and 15-Scenes datasets, either matching or outperforming state-of-the-art results. The codebooks are compact and inference is fast. Finally, we introduce an original method to visualize the codebooks and decipher what each visual codeword encodes.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

12_ECCV.pdf (282.49 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hanlin Goh : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00816428

Soumis le : lundi 22 avril 2013-11:31:37

Dernière modification le : mardi 7 novembre 2023-11:06:04

Archivage à long terme le : mardi 23 juillet 2013-04:12:41

Dates et versions

hal-00816428 , version 1 (22-04-2013)

Identifiants

HAL Id : hal-00816428 , version 1
DOI : 10.1007/978-3-642-33715-4_22

Citer

Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim. Unsupervised and supervised visual codes with restricted Boltzmann machines. 12th European conference on Computer Vision, Oct 2012, Florence, Italy. pp.298-311, ⟨10.1007/978-3-642-33715-4_22⟩. ⟨hal-00816428⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS IPAL LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

287 Consultations

600 Téléchargements

Unsupervised and supervised visual codes with restricted Boltzmann machines

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager