On the Importance of Visual Context for Data Augmentation in Scene Understanding

Nikita Dvornik; Julien Mairal; Cordelia Schmid

doi:10.1109/TPAMI.2019.2961896

Article Dans Une Revue IEEE Transactions on Pattern Analysis and Machine Intelligence Année : 2019

On the Importance of Visual Context for Data Augmentation in Scene Understanding

(1) , (1) , (1)

Nikita Dvornik

Fonction : Auteur
PersonId : 1034811

Apprentissage de modèles à partir de données massives

Julien Mairal

Fonction : Auteur
PersonId : 1034832
ORCID : 0000-0001-6991-2110
IdRef : 152125256

Apprentissage de modèles à partir de données massives

Cordelia Schmid

Fonction : Auteur
PersonId : 831154

Apprentissage de modèles à partir de données massives

Résumé

Performing data augmentation for learning deep neural networks is known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. While simple image transformations can already improve predictive performance in most vision tasks, larger gains can be obtained by leveraging task-specific knowledge. In this work, we consider object detection, semantic and instance segmentation and augment training images by blending objects in existing scenes, using instance segmentation annotations. We observe that randomly pasting objects on images hurts the performance, unless the object is placed in the right context. To resolve this issue, we propose an explicit context model by using a convolutional neural network, which predicts whether an image region is suitable for placing a given object or not. In our experiments, we show that our approach is able to improve object detection, semantic and instance segmentation on the PASCAL VOC12 and COCO datasets, with significant gains in a limited annotation scenario. We also show that the method is not limited to datasets that come with expensive pixel-wise instance annotations and can be used when only bounding boxes are available, by employing weakly-supervised learning for instance masks approximation.

Mots clés

Convolutional Neural Networks Visual Context Semantic Segmentation Data Augmentation Object Detection

Domaines

Informatique [cs] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

main.pdf (3.59 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Nikita Dvornik : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01869784

Soumis le : lundi 6 janvier 2020-17:10:49

Dernière modification le : jeudi 4 avril 2024-18:23:45

Archivage à long terme le : mercredi 8 avril 2020-00:36:16

Dates et versions

hal-01869784 , version 1 (06-09-2018)

hal-01869784 , version 2 (13-05-2019)

hal-01869784 , version 3 (19-09-2019)

hal-01869784 , version 4 (06-01-2020)

Identifiants

HAL Id : hal-01869784 , version 4
DOI : 10.1109/TPAMI.2019.2961896

Citer

Nikita Dvornik, Julien Mairal, Cordelia Schmid. On the Importance of Visual Context for Data Augmentation in Scene Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43 (6), pp.2014-2028. ⟨10.1109/TPAMI.2019.2961896⟩. ⟨hal-01869784v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI INRIA2 LJK-GI-THOTH UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

464 Consultations

860 Téléchargements

On the Importance of Visual Context for Data Augmentation in Scene Understanding

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager