Bag-of-Words Image Representation: Key Ideas and Further Insight

Marc Teva Law 1 Nicolas Thome 1 Matthieu Cord 1
1 MLIA - Machine Learning and Information Access
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In the context of object and scene recognition, state-of-the-art performances are obtained with visual Bag-of-Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g., Scale-Invariant Feature Transform (SIFT)). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this chapter, we study in detail the different components of the BoW model in the context of image classification. Particularly, we focus on the coding and pooling steps and investigate the impact of the main parameters of the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid-level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme that exploits the specificities of edge-based descriptors. Low and high contrast regions are pooled separately and combined to provide a powerful representation of images. We study the impact on classification performance of the contrast threshold that determines whether a SIFT descriptor corresponds to a low contrast region or a high contrast region. Successful experiments are provided on the Caltech-101 and Scene-15 datasets.
Type de document :
Chapitre d'ouvrage
Fusion in Computer Vision - Understanding Complex Visual Content, Springer, pp.29-52, 2014, Advances in Computer Vision and Pattern Recognition, 〈10.1007/978-3-319-05696-8_2〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01221734
Contributeur : Lip6 Publications <>
Soumis le : mercredi 28 octobre 2015 - 14:42:07
Dernière modification le : jeudi 22 novembre 2018 - 14:09:35

Identifiants

Collections

Citation

Marc Teva Law, Nicolas Thome, Matthieu Cord. Bag-of-Words Image Representation: Key Ideas and Further Insight. Fusion in Computer Vision - Understanding Complex Visual Content, Springer, pp.29-52, 2014, Advances in Computer Vision and Pattern Recognition, 〈10.1007/978-3-319-05696-8_2〉. 〈hal-01221734〉

Partager

Métriques

Consultations de la notice

108