Generalizing Adversarial Explanations with Grad-CAM

Tanmay Chakraborty; Utkarsh Trehan; Khawla Mallat; Jean-Luc Dugelay

Pré-Publication, Document De Travail Année : 2022

Generalizing Adversarial Explanations with Grad-CAM

, , ,

Tanmay Chakraborty

Fonction : Auteur
PersonId : 1144315

Utkarsh Trehan

Fonction : Auteur

Khawla Mallat

Fonction : Auteur
PersonId : 1125231

Jean-Luc Dugelay

Fonction : Auteur
PersonId : 1016456

Résumé

Gradient-weighted Class Activation Mapping (Grad- CAM), is an example-based explanation method that provides a gradient activation heat map as an explanation for Convolution Neural Network (CNN) models. The drawback of this method is that it cannot be used to generalize CNN behaviour. In this paper, we present a novel method that extends Grad-CAM from example-based explanations to a method for explaining global model behaviour. This is achieved by introducing two new metrics, (i) Mean Observed Dissimilarity (MOD) and (ii) Variation in Dissimilarity (VID), for model generalization. These metrics are computed by comparing a Normalized Inverted Structural Similarity Index (NISSIM) metric of the Grad-CAM generated heatmap for samples from the original test set and samples from the adversarial test set. For our experiment, we study adversarial attacks on deep models such as VGG16, ResNet50, and ResNet101, and wide models such as InceptionNetv3 and XceptionNet using Fast Gradient Sign Method (FGSM). We then compute the metrics MOD and VID for the automatic face recognition (AFR) use case with the VGGFace2 dataset. We observe a consistent shift in the region highlighted in the Grad-CAM heatmap, reflecting its participation to the decision making, across all models under adversarial attacks. The proposed method can be used to understand adversarial attacks and explain the behaviour of black box CNN models for image analysis.

Domaines

Ingénierie assistée par ordinateur

Fichier principal

publi-6871.pdf (6.47 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Centre De Documentation Eurecom : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03644916

Soumis le : mercredi 22 juin 2022-17:01:06

Dernière modification le : mercredi 22 juin 2022-17:02:51

Dates et versions

hal-03644916 , version 1 (22-06-2022)

Licence

Paternité

Identifiants

HAL Id : hal-03644916 , version 1
ARXIV : 2204.05427

Citer

Tanmay Chakraborty, Utkarsh Trehan, Khawla Mallat, Jean-Luc Dugelay. Generalizing Adversarial Explanations with Grad-CAM. 2022. ⟨hal-03644916⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EURECOM ANR

66 Consultations

32 Téléchargements

Generalizing Adversarial Explanations with Grad-CAM

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager