Variance-based importance measures for machine learning model interpretability

Machine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of “interpretability” remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called “importance measures”) whose aim is to quantify the impact of each predictor on the statistical model’s output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties.

Les algorithmes statistiques d'apprentissage automatique (ou machine learning) connaissent un essor sans précédent dans le monde industriel, notamment pour l'aide à la décision en ingénierie des systèmes critiques. Toutefois, leur manque d'"interprétabilité" est un verrou à lever afin de rendre ces outils intelligibles et auditables. Ce papier vise à dresser une cartographie de certaines métriques d'interprétabilité (appelées "mesures d'importance") dont le but est de quantifier l'impact de chaque prédicteur sur la variance de la sortie du modèle statistique. Il est montré que le choix d'une métrique pertinente doit être guidé par les contraintes inhérentes aux données et au modèle considéré (caractère linéaire ou non du phénomène d'intérêt, dimension du problème, dépendance des prédicteurs) et par le type d'étude que l'utilisateur souhaite mener (détecter les variables influentes, les hiérarchiser, etc.). Enfin, ces métriques sont estimées et analysées sur un jeu de données public afin d'illustrer certaines de leurs propriétés théoriques et empiriques. Keywords-apprentissage statistique, interprétabilité, analyse de sensibilité, effets de Shapley, indices de Sobol' Abstract-Machine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of "interpretability" remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called "importance measures") whose aim is to quantify the impact of each predictor on the statistical model's output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties.

Mots clés

statistical learning interpretability sensitivity analysis Shapley effects Sobol' indices

Domaines

Statistiques [math.ST]

Fichier principal

main.pdf (1.55 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Bertrand Iooss : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03741384

Soumis le : lundi 1 août 2022-11:40:15

Dernière modification le : jeudi 11 avril 2024-13:16:13

Archivage à long terme le : mercredi 2 novembre 2022-18:38:06

Dates et versions

hal-03741384 , version 1 (01-08-2022)

Identifiants

HAL Id : hal-03741384 , version 1

Citer

Bertrand Iooss, Vincent Chabridon, Vincent Thouvenot. Variance-based importance measures for machine learning model interpretability. Actes du Congrès λμ23, Oct 2022, Saclay, France. ⟨hal-03741384⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INSA-TOULOUSE IMT UT1-CAPITOLE EDF INSA-GROUPE UNIV-UT3 UT3-TOULOUSEINP

323 Consultations

545 Téléchargements