Pooling in Image Representation: the Visual Codeword Point of View

Sandra Avila; Nicolas Thome; Matthieu Cord; Eduardo Valle; Arnaldo de Albuquerque Araújo

doi:10.1016/j.cviu.2012.09.007

Article Dans Une Revue Computer Vision and Image Understanding Année : 2013

Pooling in Image Representation: the Visual Codeword Point of View

(1) , (1) , (1) , ,

Sandra Avila

Fonction : Auteur
PersonId : 910690

Machine Learning and Information Access

Nicolas Thome

Fonction : Auteur
PersonId : 181803
IdHAL : nicolas-thome
ORCID : 0000-0003-4871-3045
IdRef : 12878332X

Machine Learning and Information Access

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Eduardo Valle

Fonction : Auteur

Arnaldo de Albuquerque Araújo

Fonction : Auteur

Résumé

In this work, we propose BossaNova, a novel representation for content-based concept detection in images and videos, which enriches the Bag-of-Words model. Relying on the quantization of highly discriminant local descriptors by a codebook, and the aggregation of those quantized descriptors into a single pooled feature vector, the Bag-of-Words model has emerged as the most promising approach for concept detection on visual documents. BossaNova enhances that representation by keeping a histogram of distances between the descriptors found in the image and those in the codebook, preserving thus important information about the distribution of the local descriptors around each codeword. Contrarily to other approaches found in the literature, the non-parametric histogram representation is compact and simple to compute. BossaNova compares well with the state-of-the-art in several standard datasets: MIRFLICKR, ImageCLEF 2011, PASCAL VOC 2007 and 15-Scenes, even without using complex combinations of different local descriptors. It also complements well the cutting-edge Fisher Vector descriptors, showing even better results when employed in combination with them. BossaNova also shows good results in the challenging real-world application of pornography detection.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01172709

Soumis le : mardi 7 juillet 2015-16:59:19

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01172709 , version 1 (07-07-2015)

Identifiants

HAL Id : hal-01172709 , version 1
DOI : 10.1016/j.cviu.2012.09.007

Citer

Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, Arnaldo de Albuquerque Araújo. Pooling in Image Representation: the Visual Codeword Point of View. Computer Vision and Image Understanding, 2013, 117 (5), pp.453-465. ⟨10.1016/j.cviu.2012.09.007⟩. ⟨hal-01172709⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

171 Consultations

0 Téléchargements

Pooling in Image Representation: the Visual Codeword Point of View

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager