Human Action Localization with Sparse Spatial Supervision

Philippe Weinzaepfel; Xavier Martin; Cordelia Schmid

Pré-Publication, Document De Travail Année : 2017

Human Action Localization with Sparse Spatial Supervision

(1) , (2) , (2)

1
2

Philippe Weinzaepfel

Fonction : Auteur

Xerox Research Centre Europe [Meylan]

Xavier Martin

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Cordelia Schmid

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Résumé

We introduce an approach for spatio-temporal human action localization using sparse spatial supervision. Our method leverages the large amount of annotated humans available today and extracts human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Given these high-quality human tubes and temporal supervision, we select positive and negative tubes with very sparse spatial supervision, i.e., only one spatially annotated frame per instance. The selected tubes allow us to effectively learn a spatio-temporal action detector based on dense trajectories or CNNs. We conduct experiments on existing action localization benchmarks: UCF-Sports, J-HMDB and UCF-101. Our results show that our approach, despite using sparse spatial supervision, performs on par with methods using full supervision, i.e., one bounding box annotation per frame. To further validate our method, we introduce DALY (Daily Action Localization in YouTube), a dataset for realistic action localization in space and time. It contains high quality temporal and spatial annotations for 3.6k instances of 10 actions in 31 hours of videos (3.3M frames). It is an order of magnitude larger than existing datasets, with more diversity in appearance and long untrimmed videos.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

ActionLocSparse.pdf (3.38 Mo)

brushing_B3-ywpMj4Jk.jpg (116.2 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Figure, Image
Origine : Fichiers produits par l'(les) auteur(s)

THOTH Team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01317558

Soumis le : mercredi 24 mai 2017-10:41:40

Dernière modification le : jeudi 4 avril 2024-21:16:29

Dates et versions

hal-01317558 , version 1 (18-05-2016)

hal-01317558 , version 2 (24-05-2017)

Identifiants

HAL Id : hal-01317558 , version 2
ARXIV : 1605.05197

Citer

Philippe Weinzaepfel, Xavier Martin, Cordelia Schmid. Human Action Localization with Sparse Spatial Supervision. 2017. ⟨hal-01317558v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LJK LJK_GI PERSYVAL-LAB INRIA2 LJK-GI-THOTH ANR

1137 Consultations

1502 Téléchargements

Human Action Localization with Sparse Spatial Supervision

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager