Classification tree algorithm for grouped variables - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

Classification tree algorithm for grouped variables

Résumé

We consider the problem of predicting a categorical variable based on groups of inputs. Some methods have already been proposed to elaborate classification rules based on groups of variables (e.g. group lasso for logistic regression). However, to our knowledge, no tree-based approach has been proposed to tackle this issue. Here, we propose the Tree Penalized Linear Discriminant Analysis algorithm (TPLDA), a new-tree based approach which constructs a classification rule based on groups of variables. It consists in splitting a node by repeatedly selecting a group and then applying a regularized linear discriminant analysis based on this group. This process is repeated until some stopping criterion is satisfied. A pruning strategy is proposed to select an optimal tree. Compared to the existing multivariate classification tree methods, the proposed method is computationally less demanding and the resulting trees are more easily interpretable. Furthermore, TPLDA automatically provides a measure of importance for each group of variables. This score allows to rank groups of variables with respect to their ability to predict the response and can also be used to perform group variable selection. The good performances of the proposed algorithm and its interest in terms of prediction accuracy, interpretation and group variable selection are loud and compared to alternative reference methods through simulations and applications on real datasets.
Fichier principal
Vignette du fichier
ClassificationTreeAlgorithmForGroupedVariables_APoterie_JFDupuy_VMonbet_LRouviere.pdf (1.26 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01623570 , version 1 (25-10-2017)
hal-01623570 , version 2 (19-06-2018)
hal-01623570 , version 3 (21-06-2018)
hal-01623570 , version 4 (17-01-2019)

Identifiants

  • HAL Id : hal-01623570 , version 3

Citer

Audrey Poterie, Jean-François Dupuy, Valérie Monbet, Laurent Rouviere. Classification tree algorithm for grouped variables. 2018. ⟨hal-01623570v3⟩
743 Consultations
1859 Téléchargements

Partager

Gmail Facebook X LinkedIn More