Risk Bounds for Embedded Variable Selection in Classification Trees

Abstract : The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion.
Type de document :
Article dans une revue
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2014, 60 (3), pp.1688-1699
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00613041
Contributeur : Servane Gey <>
Soumis le : vendredi 22 juin 2012 - 16:44:50
Dernière modification le : mercredi 19 juillet 2017 - 16:36:28
Document(s) archivé(s) le : vendredi 31 mars 2017 - 09:48:54

Fichiers

GeyMaryHuard_arxivV2.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales 4.0 International License

Identifiants

Citation

Servane Gey, Tristan Mary-Huard. Risk Bounds for Embedded Variable Selection in Classification Trees. IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2014, 60 (3), pp.1688-1699. <hal-00613041v2>

Partager

Métriques

Consultations de
la notice

330

Téléchargements du document

126