Apprentissage multi-label extrême : Comparaisons d'approches et nouvelles propositions

Abstract : Stimulated by many applications such as documents or images annotation, multi- label learning have gained a strong interest during the last decade. But, standard algorithms cannot cope with the volumes of the recent extreme multi-label data (XML) where the number of labels can reach millions. This thesis explores three directions to address the complexity in time and memory of the problem: multi-label dimension reduction, optimization and implementation tricks, and tree-based methods. It proposes to unify the reduction approaches through a typology and two generic formulations and to identify the most efficient ones with an original meta-analysis of the results of the literature. A new approach is developed to analyze the interest of coupling the reduction problem and the classification problem. To reduce the memory complexity of a classical one-vs-rest regression model while maintaining its predictive performances, we also propose an algorithm for estimating the largest useful parameters that follows a strategy inspired by data stream analysis. Finally, we present a new algorithm called CRAFTML that learns an ensemble of diversified decision trees. Each tree performs a joint random reduction of the feature and the label spaces and implements a very fast recursive partitioning strategy. CRAFTML performs better than other XML tree-based methods and is competitive with the most accurate methods that require supercomputers. The contributions of the thesis are completed by the presentation of a software called VIPE that is developed with Orange Labs for multi- label opinion analysis.
Document type :
Theses
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/tel-02010219
Contributor : Wissam Siblini <>
Submitted on : Wednesday, February 6, 2019 - 11:28:13 PM
Last modification on : Tuesday, March 26, 2019 - 9:25:22 AM

File

these.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02010219, version 1

Collections

Citation

Wissam Siblini. Apprentissage multi-label extrême : Comparaisons d'approches et nouvelles propositions. Informatique [cs]. Université de Nantes, Ecole Polytechnique, 2018. Français. ⟨tel-02010219⟩

Share

Metrics

Record views

40

Files downloads

84