Features Understanding in 3D CNNs for Actions Recognition in Video - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Features Understanding in 3D CNNs for Actions Recognition in Video

Kazi Ahmed Asif Fuad
  • Fonction : Auteur
Romain Giot
Romain Bourqui
  • Fonction : Auteur
  • PersonId : 1078969
Jenny Benois-Pineau
Akka Zemmari
  • Fonction : Auteur
  • PersonId : 1078970

Résumé

Human Action Recognition is one of the key tasks in video understanding. Deep Convolutional Neural Networks (CNN) are often used for this purpose. Although they usually perform impressively, their decision interpretation remains challenging. We propose a novel visual CNN features understanding technique. Its objective is to find salient features that played a key role in decision making of the network. The technique only uses the features from the last convolutional layer before the fully connected layers of a trained model and builds an importance map of features. The map is propagated to the original frame thus highlighting the regions in them that contribute to the final decision. The method is fast as it does not require gradient computation as many state-of-the-art methods do. Proposed technique is applied to the Twin Spatio-Temporal 3D Convolutional Neural Network (TSTCNN), designed for Table Tennis Actions recognition. Features visualization is performed at the RGB and Optical flow branches of the network. Obtained results are compared to other visualization techniques both in terms of human understanding and similarity metrics. The metrics show that generated maps are similar to those obtained with known Grad-CAM method, e.g. Pearson Correlation Coefficient between the maps generated of RGB data for Grad-CAM and our method is 0.7 ± 0.05 and 0.72 ± 0.06 on Optical Flow data.
Fichier principal
Vignette du fichier
Features_Understanding_Method.pdf (1.51 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02963298 , version 1 (10-10-2020)

Identifiants

  • HAL Id : hal-02963298 , version 1

Citer

Kazi Ahmed Asif Fuad, Pierre-Etienne Martin, Romain Giot, Romain Bourqui, Jenny Benois-Pineau, et al.. Features Understanding in 3D CNNs for Actions Recognition in Video. Tenth International Conference on Image Processing Theory, Tools and Applications, IPTA 2020, Oct 2020, Paris, France. ⟨hal-02963298⟩

Collections

CNRS
272 Consultations
119 Téléchargements

Partager

Gmail Facebook X LinkedIn More