Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests
Résumé
In the context of ensemble learning, especially for random forests models, the out-of-bag (OOB) procedure, using the training set, produces an estimation of the generalization error. The OOB error has the same purpose than the cross-validation error, but comes with very specific points. First, there exists an OOB classifier that leads to the OOB evaluation. Second, the OOB classifier is embedded in the forest classifier. We show in this paper that these two intrinsic properties lead to produce simple conditions for the test error to be bounded by the OOB error. Conditions come with the only required and usual assumptions which are the i.i.d one and the existence of first and second order moments. The main interest is that the OOB error is explicitly known, hence one just needs a training set without any other assumption on the model behind the data. As a practical case, we use Random Uniform Forests (Ciss, 2015a), a variant of Random Forests (Breiman, 2001) that inherits of all properties of the latter, to show how OOB bounds apply. We also provide an R package, randomUniformForest, allowing to experiment all the arguments described in the paper.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...