Evaluating self-attention interpretability through human-grounded experimental protocol - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Evaluating self-attention interpretability through human-grounded experimental protocol

Résumé

Attention mechanisms have played a crucial role in the development of complex architectures such as Transformers in natural language processing. However, Transformers remain hard to interpret and are considered as black-boxes. In this paper we assess how attention coefficients from Transformers help in providing classifier interpretability when properly aggregated. A fast and easy-to-implement way of aggregating attention is proposed to build local feature importance. A human-grounded experiment is conducted to evaluate and compare this approach to other usual interpretability methods. The experimental protocol relies on the capacity of an interpretability method to provide explanation in line with human reasoning. Experiment design includes measuring reaction times and correct response rates by human subjects. Attention performs comparably to usual interpretability methods and significantly better than a random baseline regarding average participant reaction time and accuracy. Moreover, data analysis highlights that high probability prediction induces great explanation relevance. This work shows how self-attention can be aggregated and used to explain Transformer classifiers. The low computational cost of attention compared to other interpretability methods and its availability by design within Transformer classifiers make it particularly beneficial. Finally, the quality of its explanation depends strongly on the certainty of the classifier's prediction related to it.
Fichier principal
Vignette du fichier
Evaluating_self_attention_interpretability_through_human_grounded_experimental_protocol___Springer_xAI.pdf (1.07 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04311790 , version 1 (28-11-2023)

Identifiants

Citer

Milan Bhan, Nina Achache, Victor Legrand, Annabelle Blangero, Nicolas Chesneau. Evaluating self-attention interpretability through human-grounded experimental protocol. First World Conference on Explainable Artificial Intelligence xAI, Jul 2023, Lisbonne, Portugal. pp.26--46, ⟨10.1007/978-3-031-44070-0_2⟩. ⟨hal-04311790⟩
26 Consultations
39 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More