Bag-of-Words Image Representation: Key Ideas and Further Insight

Marc Teva Law; Nicolas Thome; Matthieu Cord

doi:10.1007/978-3-319-05696-8_2

Chapitre D'ouvrage Année : 2014

Bag-of-Words Image Representation: Key Ideas and Further Insight

(1) , (1) , (1)

Marc Teva Law

Fonction : Auteur
PersonId : 972375

Machine Learning and Information Access

Nicolas Thome

Fonction : Auteur
PersonId : 181803
IdHAL : nicolas-thome
ORCID : 0000-0003-4871-3045
IdRef : 12878332X

Machine Learning and Information Access

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Résumé

In the context of object and scene recognition, state-of-the-art performances are obtained with visual Bag-of-Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g., Scale-Invariant Feature Transform (SIFT)). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this chapter, we study in detail the different components of the BoW model in the context of image classification. Particularly, we focus on the coding and pooling steps and investigate the impact of the main parameters of the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid-level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme that exploits the specificities of edge-based descriptors. Low and high contrast regions are pooled separately and combined to provide a powerful representation of images. We study the impact on classification performance of the contrast threshold that determines whether a SIFT descriptor corresponds to a low contrast region or a high contrast region. Successful experiments are provided on the Caltech-101 and Scene-15 datasets.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01221734

Soumis le : mercredi 28 octobre 2015-14:42:07

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01221734 , version 1 (28-10-2015)

Identifiants

HAL Id : hal-01221734 , version 1
DOI : 10.1007/978-3-319-05696-8_2

Citer

Marc Teva Law, Nicolas Thome, Matthieu Cord. Bag-of-Words Image Representation: Key Ideas and Further Insight. Fusion in Computer Vision - Understanding Complex Visual Content, Springer, pp.29-52, 2014, Advances in Computer Vision and Pattern Recognition, ⟨10.1007/978-3-319-05696-8_2⟩. ⟨hal-01221734⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

173 Consultations

0 Téléchargements

Bag-of-Words Image Representation: Key Ideas and Further Insight

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager