Distances and weighting schemes for bag of visual words image retrieval - Archive ouverte HAL Access content directly
Conference Papers Year : 2010

Distances and weighting schemes for bag of visual words image retrieval

Pierre Tirilly
Patrick Gros

Abstract

Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers try to benefit from text retrieval contributions to enhance their retrieval systems. In particular, Sivic and Zisserman, with their video-google framework [1], propose a description of images similar to standard text descriptors: images are described by elementary image parts, called visual words. Thus, they perform image retrieval using the standard Vector Space Model (VSM) of Information Retrieval (IR) and benefit from some classical IR techniques such as inverted files. Among available text retrieval techniques, automatically finding the most relevant words to describe a document has been intensively studied for texts, but not for images. In this paper, we propose to explore the use of term weighting techniques and classical distances from text retrieval in the case of images. These weights are standard VSM weights, weights derived from probabilistic models of IR or new weighting schemes that we propose. Our experiments, performed on several datasets, show that no weighting scheme can improve retrieval on every dataset, but also that choosing weights fitting the properties of the dataset can improve precision and MAP up to 10 percents. This study provides some interesting insights about the semantic and statistical differences between textual and visual words, and about the way visual word-based image retrieval systems can be optimized. It also shows some limits of the bag of visual words model, and the relation existing between Minkowski distances and local weighting. At last, this study questions some experimental habits commonly found in the literature (choice of L1 distance, TF*IDF weights and evaluation using one dataset only).
No file

Dates and versions

inria-00523975 , version 1 (06-10-2010)

Identifiers

Cite

Pierre Tirilly, Vincent Claveau, Patrick Gros. Distances and weighting schemes for bag of visual words image retrieval. ACM International Conference on Multimedia Information Retrieval, ACM, Mar 2010, Philadelphia, Pennsylvania, United States. pp.323-332, ⟨10.1145/1743384.1743438⟩. ⟨inria-00523975⟩
210 View
0 Download

Altmetric

Share

Gmail Facebook X LinkedIn More