VSURF: An R Package for Variable Selection Using Random Forests

Abstract : This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
Document type :
Journal articles
Liste complète des métadonnées

Cited literature [40 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01251924
Contributor : Robin Genuer <>
Submitted on : Wednesday, September 20, 2017 - 9:01:58 AM
Last modification on : Tuesday, September 18, 2018 - 4:24:02 PM

File

genuer-poggi-tuleaumalot.pdf
Publisher files allowed on an open archive

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01251924, version 1

Citation

Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot. VSURF: An R Package for Variable Selection Using Random Forests. The R Journal, R Foundation for Statistical Computing, 2015, 7 (2), pp.19-33. ⟨hal-01251924v1⟩

Share

Metrics

Record views

6304

Files downloads

763