A large-sample theory for infinitesimal gradient boosting

Clement Dombry; Jean-Jil Duchamps

Pré-Publication, Document De Travail Année : 2022

A large-sample theory for infinitesimal gradient boosting

(1) , (1)

Clement Dombry

Fonction : Auteur

Laboratoire de Mathématiques de Besançon (UMR 6623)

Jean-Jil Duchamps

Fonction : Auteur
PersonId : 1029136

Laboratoire de Mathématiques de Besançon (UMR 6623)

Résumé

Infinitesimal gradient boosting is defined as the vanishing-learning-rate limit of the popular tree-based gradient boosting algorithm from machine learning (Dombry and Duchamps, 2021). It is characterized as the solution of a nonlinear ordinary differential equation in a infinite-dimensional function space where the infinitesimal boosting operator driving the dynamics depends on the training sample. We consider the asymptotic behavior of the model in the large sample limit and prove its convergence to a deterministic process. This infinite population limit is again characterized by a differential equation that depends on the population distribution. We explore some properties of this population limit: we prove that the dynamics makes the test error decrease and we consider its long time behavior.

Domaines

Probabilités [math.PR] Statistiques [math.ST] Machine Learning [stat.ML]

Jean-Jil Duchamps : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03795853

Soumis le : mardi 4 octobre 2022-11:16:39

Dernière modification le : vendredi 3 mai 2024-13:58:19

Dates et versions

hal-03795853 , version 1 (04-10-2022)

Identifiants

HAL Id : hal-03795853 , version 1
ARXIV : 2210.00736

Citer

Clement Dombry, Jean-Jil Duchamps. A large-sample theory for infinitesimal gradient boosting. 2022. ⟨hal-03795853⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-FCOMTE INSMI LMB

21 Consultations

0 Téléchargements

A large-sample theory for infinitesimal gradient boosting

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager