Risk Bounds for Embedded Variable Selection in Classification Trees - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Information Theory Année : 2014

Risk Bounds for Embedded Variable Selection in Classification Trees

Résumé

The problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion.
Fichier principal
Vignette du fichier
GeyMaryHuard_arxivV2.pdf (403.69 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00613041 , version 1 (02-08-2011)
hal-00613041 , version 2 (22-06-2012)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

Citer

Servane Gey, Tristan Mary-Huard. Risk Bounds for Embedded Variable Selection in Classification Trees. IEEE Transactions on Information Theory, 2014, 60 (3), pp.1688-1699. ⟨hal-00613041v2⟩
553 Consultations
284 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More