Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests

Saïp Ciss

Pré-Publication, Document De Travail Année : 2015

Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests

(1)

Saïp Ciss

Fonction : Auteur

Modélisation aléatoire de Paris X

Résumé

In the context of ensemble learning, especially for random forests models, the out-of-bag (OOB) procedure, using the training set, produces an estimation of the generalization error. The OOB error has the same purpose than the cross-validation error, but comes with very specific points. First, there exists an OOB classifier that leads to the OOB evaluation. Second, the OOB classifier is embedded in the forest classifier. We show in this paper that these two intrinsic properties lead to produce simple conditions for the test error to be bounded by the OOB error. Conditions come with the only required and usual assumptions which are the i.i.d one and the existence of first and second order moments. The main interest is that the OOB error is explicitly known, hence one just needs a training set without any other assumption on the model behind the data. As a practical case, we use Random Uniform Forests (Ciss, 2015a), a variant of Random Forests (Breiman, 2001) that inherits of all properties of the latter, to show how OOB bounds apply. We also provide an R package, randomUniformForest, allowing to experiment all the arguments described in the paper.

Mots clés

Random Uniform Forests Random Forests statistical learning bounds Out-of-bag error classification regression R package test error generalization error

Domaines

Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

OOBBoundsForTestError.pdf (439.15 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Saip Ciss : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01110524

Soumis le : lundi 16 février 2015-13:32:57

Dernière modification le : jeudi 4 avril 2024-03:09:48

Archivage à long terme le : dimanche 16 avril 2017-09:34:59

Dates et versions

hal-01110524 , version 1 (28-01-2015)

hal-01110524 , version 2 (16-02-2015)

Identifiants

HAL Id : hal-01110524 , version 2

Citer

Saïp Ciss. Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests. 2015. ⟨hal-01110524v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MODALX UNIV-PARIS-LUMIERES UNIV-PARIS-NANTERRE

254 Consultations

2590 Téléchargements

Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager