Skip to Main content Skip to Navigation
Conference papers

A multi-scale approach to gesture detection and recognition

Abstract : We propose a generalized approach to human gesture recognition based on multiple data modalities such as depth video, articulated pose and speech. In our system, each gesture is decomposed into large-scale body motion and local subtle movements such as hand articulation. The idea of learning at multiple scales is also applied to the temporal dimension, such that a gesture is considered as a set of characteristic motion impulses, or dynamic poses. Each modality is first processed separately in short spatio-temporal blocks, where discriminative data-specific features are either manually extracted or learned. Finally, we employ a Recurrent Neural Network for modeling large-scale temporal dependencies, data fusion and ultimately gesture classification. Our experiments on the 2013 Challenge on Multi-modal Gesture Recognition dataset have demonstrated that using multiple modalities at several spatial and temporal scales leads to a significant increase in performance allowing the model to compensate for errors of individual classifiers as well as noise in the separate channels.
Document type :
Conference papers
Complete list of metadatas
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Wednesday, June 29, 2016 - 3:50:42 PM
Last modification on : Thursday, November 21, 2019 - 2:24:27 AM



Natalia Neverova, Christian Wolf, Giulio Paci, Giacomo Sommavilla, Graham W. Taylor, et al.. A multi-scale approach to gesture detection and recognition. ICCV Workshop on Understanding Human Activities: Context and Interactions (HACI 2013), Dec 2013, Sydney, Australia. pp.484-491, ⟨10.1109/ICCVW.2013.69⟩. ⟨hal-01339262⟩



Record views