Context-aware CNNs for person head detection

Tuan-Hung Vu 1 Anton Osokin 2, 3 Ivan Laptev 1, 3
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under a full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent local R-CNN object detector, we extend it with two types of contextual cues. First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image. Second, we explicitly model pairwise relations among objects and train a Pairwise CNN model using a structured-output surrogate loss. The Local, Global and Pairwise models are combined into a joint CNN framework. To train and test our full model, we introduce a large dataset composed of 369,846 human heads annotated in 224,740 movie frames. We evaluate our method and demonstrate improvements of person head detection against several recent baselines in three datasets. We also show improvements of the detection speed provided by our model.
Type de document :
Communication dans un congrès
ICCV 2015 - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chile. Proceedings of IEEE International Conference on Computer Vision
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01237618
Contributeur : Anton Osokin <>
Soumis le : jeudi 3 décembre 2015 - 15:08:31
Dernière modification le : jeudi 11 janvier 2018 - 06:23:26

Lien texte intégral

Identifiants

  • HAL Id : hal-01237618, version 1
  • ARXIV : 1511.07917

Collections

Citation

Tuan-Hung Vu, Anton Osokin, Ivan Laptev. Context-aware CNNs for person head detection. ICCV 2015 - IEEE International Conference on Computer Vision, Dec 2015, Santiago, Chile. Proceedings of IEEE International Conference on Computer Vision. 〈hal-01237618〉

Partager

Métriques

Consultations de la notice

136