Analysis of purely random forests bias

Sylvain Arlot 1, 2 Robin Genuer 3, 4
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
4 SISTM - Statistics In System biology and Translational Medicine
Epidémiologie et Biostatistique [Bordeaux], Inria Bordeaux - Sud-Ouest
Abstract : Random forests are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of random forests. In this paper, we study the approximation error (the bias) of some purely random forest models in a regression framework, focusing in particular on the influence of the number of trees in the forest. Under some regularity assumptions on the regression function, we show that the bias of an infinite forest decreases at a faster rate (with respect to the size of each tree) than a single tree. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees. Furthermore, our results allow to derive a minimum number of trees sufficient to reach the same rate as an infinite forest. As a by-product of our analysis, we also show a link between the bias of purely random forests and the bias of some kernel estimators.
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01023596
Contributor : Sylvain Arlot <>
Submitted on : Monday, July 14, 2014 - 6:37:38 PM
Last modification on : Wednesday, September 28, 2016 - 4:16:39 PM
Document(s) archivé(s) le : Thursday, November 20, 2014 - 6:37:04 PM

Files

hal_v1_purfbias.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01023596, version 1
  • ARXIV : 1407.3939

Collections

Citation

Sylvain Arlot, Robin Genuer. Analysis of purely random forests bias. 2014. <hal-01023596>

Share

Metrics

Record views

2379

Document downloads

257