Skip to Main content Skip to Navigation
Book sections

Bag-of-Words Image Representation: Key Ideas and Further Insight

Marc Teva Law 1 Nicolas Thome 1 Matthieu Cord 1
1 MLIA - Machine Learning and Information Access
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In the context of object and scene recognition, state-of-the-art performances are obtained with visual Bag-of-Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g., Scale-Invariant Feature Transform (SIFT)). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this chapter, we study in detail the different components of the BoW model in the context of image classification. Particularly, we focus on the coding and pooling steps and investigate the impact of the main parameters of the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid-level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme that exploits the specificities of edge-based descriptors. Low and high contrast regions are pooled separately and combined to provide a powerful representation of images. We study the impact on classification performance of the contrast threshold that determines whether a SIFT descriptor corresponds to a low contrast region or a high contrast region. Successful experiments are provided on the Caltech-101 and Scene-15 datasets.
Document type :
Book sections
Complete list of metadata
Contributor : Lip6 Publications <>
Submitted on : Wednesday, October 28, 2015 - 2:42:07 PM
Last modification on : Friday, January 8, 2021 - 5:34:10 PM



Marc Teva Law, Nicolas Thome, Matthieu Cord. Bag-of-Words Image Representation: Key Ideas and Further Insight. Fusion in Computer Vision - Understanding Complex Visual Content, Springer, pp.29-52, 2014, Advances in Computer Vision and Pattern Recognition, ⟨10.1007/978-3-319-05696-8_2⟩. ⟨hal-01221734⟩



Record views