Influence Measures for CART Classification Trees

Abstract : This paper deals with measuring the influence of observations on the results obtained with CART classification trees. To define the influence of individuals on the analysis, we use influence functions to propose some general criterions to measure the sensitivity of the CART analysis and its robustness. The proposals, based on jakknife trees, are organized around two lines: influence on predictions and influence on partitions. In addition, the analysis is extended to the pruned sequences of CART trees to produce a CART specific notion of influence. A numerical example, the well known spam dataset, is presented to illustrate the notions developed throughout the paper. A real dataset relating the administrative classification of cities surrounding Paris, France, to the characteristics of their tax revenues distribution, is finally analyzed using the new influence-based tools.
Type de document :
Article dans une revue
Journal of Classification, Springer Verlag, 2015, 32 (1), pp.21-45. 〈10.1007/s00357-015-9172-4〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00562039
Contributeur : Servane Gey <>
Soumis le : mercredi 2 février 2011 - 16:14:59
Dernière modification le : mardi 10 octobre 2017 - 11:22:04
Document(s) archivé(s) le : mardi 3 mai 2011 - 03:27:23

Fichier

cart.influence.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales 4.0 International License

Identifiants

Citation

Avner Bar-Hen, Servane Gey, Jean-Michel Poggi. Influence Measures for CART Classification Trees. Journal of Classification, Springer Verlag, 2015, 32 (1), pp.21-45. 〈10.1007/s00357-015-9172-4〉. 〈hal-00562039〉

Partager

Métriques

Consultations de la notice

382

Téléchargements de fichiers

252