Ranking Forests

Stéphan Clémençon

Pré-Publication, Document De Travail Année : 2010

Ranking Forests

(1)

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Laboratoire Traitement et Communication de l'Information

Résumé

It is the goal of this paper to examine how the aggregation and feature randomization principles underlying the algorithm RANDOM FOREST [1], originally proposed in the classification/regression setup, can be adapted to bipartite ranking, in order to increase the performance of scoring rules produced by the TREERANK algorithm [2], a recently developed tree induction method, specifically tailored for this global learning problem. Since TREERANK may be viewed as a recursive implementation of a cost-sensitive version of the popular classification algorithm CART [3], with a cost locally depending on the data lying within the node to split, various strategies can be considered for ”randomizing” the features involved in the tree growing stage. In parallel, several ways of combining/averaging ranking trees may be used, including techniques inspired from rank aggregation methods recently popularized in Web applications. Ranking procedures based on such approaches are called RANKING FORESTS. Beyond preliminary theoretical background, results of experiments based on simulated data are provided in order to give evidence of their statistical performance.

Mots clés

Bipartite Ranking data with binary labels ROC optimization AUC criterion tree-based ranking rules bootstrap bagging rank aggregation median procedure feature randomization. feature randomization

Domaines

Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

Forest_IEEE_sub.pdf (453.73 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00452577

Soumis le : mardi 2 février 2010-16:20:44

Dernière modification le : mercredi 17 avril 2024-13:26:44

Archivage à long terme le : jeudi 18 octobre 2012-14:15:25

Dates et versions

hal-00452577 , version 1 (02-02-2010)

Identifiants

HAL Id : hal-00452577 , version 1

Citer

Stéphan Clémençon. Ranking Forests. 2010. ⟨hal-00452577⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS PARISTECH LTCI IDS S2A

161 Consultations

315 Téléchargements

Ranking Forests

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager