Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

Fabio Martínez; Antoine Manzanera; Eduardo Romero

doi:10.1049/iet-cvi.2016.0055

Article Dans Une Revue IET Computer Vision Année : 2017

Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

(1) , (1) ,

Fabio Martínez

Fonction : Auteur

Unité d'Informatique et d'Ingénierie des Systèmes

Antoine Manzanera

Fonction : Auteur
PersonId : 2800
IdHAL : antoine-manzanera
ORCID : 0000-0001-5718-411X
IdRef : 057787360

Unité d'Informatique et d'Ingénierie des Systèmes

Eduardo Romero

Fonction : Auteur
PersonId : 953890

Résumé

This work presents a spatio-temporal motion descriptor that is computed from a spatially-constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a multi-scale dense optical flow that provides instantaneous velocity information for every pixel without explicit spatial regularization. Potential human actions are detected at each frame as spatially consistent moving regions and marked as Regions of Interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterized by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics, that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (1) The VISOR dataset was used for two purposes: first, for global classification of short sequences containing individual actions, a task for which the method reached an average accuracy of 95% (sequence rate). Also, this dataset was used for recognition of multiple actions in long sequences, achieving an average per-frame accuracy of 92.3%. (2) the KTH dataset was used for global classification of activities and (3) the UT-datasets were used for evaluating the recognition task, obtaining an average accuracy of 80% (frame rate).

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement des images [eess.IV]

Fichier principal

iet-cvi17.pdf (4.37 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Manzanera : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01671878

Soumis le : vendredi 22 décembre 2017-16:47:14

Dernière modification le : mercredi 11 mai 2022-15:20:03

Dates et versions

hal-01671878 , version 1 (22-12-2017)

Identifiants

HAL Id : hal-01671878 , version 1
DOI : 10.1049/iet-cvi.2016.0055

Citer

Fabio Martínez, Antoine Manzanera, Eduardo Romero. Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition. IET Computer Vision, 2017, 11 (7), pp.541 - 549. ⟨10.1049/iet-cvi.2016.0055⟩. ⟨hal-01671878⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA ENSTA_U2IS UNIV-PARIS-SACLAY

26 Consultations

163 Téléchargements

Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager