A multi-scale approach to gesture detection and recognition

Abstract : We propose a generalized approach to human gesture recognition based on multiple data modalities such as depth video, articulated pose and speech. In our system, each gesture is decomposed into large-scale body motion and local subtle movements such as hand articulation. The idea of learning at multiple scales is also applied to the temporal dimension, such that a gesture is considered as a set of characteristic motion impulses, or dynamic poses. Each modality is first processed separately in short spatio-temporal blocks, where discriminative data-specific features are either manually extracted or learned. Finally, we employ a Recurrent Neural Network for modeling large-scale temporal dependencies, data fusion and ultimately gesture classification. Our experiments on the 2013 Challenge on Multi-modal Gesture Recognition dataset have demonstrated that using multiple modalities at several spatial and temporal scales leads to a significant increase in performance allowing the model to compensate for errors of individual classifiers as well as noise in the separate channels.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01339262
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Wednesday, June 29, 2016 - 3:50:42 PM
Last modification on : Tuesday, February 26, 2019 - 4:35:38 PM

Identifiers

Citation

Natalia Neverova, Christian Wolf, Giulio Paci, Giacomo Sommavilla, Graham W. Taylor, et al.. A multi-scale approach to gesture detection and recognition. ICCV Workshop on Understanding Human Activities: Context and Interactions (HACI 2013), Dec 2013, Sydney, Australia. pp.484-491, ⟨10.1109/ICCVW.2013.69⟩. ⟨hal-01339262⟩

Share

Metrics

Record views

172