Skip to Main content Skip to Navigation

Apprentissage multi-label extrême : Comparaisons d'approches et nouvelles propositions

Abstract : Stimulated by many applications such as documents or images annotation, multi- label learning have gained a strong interest during the last decade. But, standard algorithms cannot cope with the volumes of the recent extreme multi-label data (XML) where the number of labels can reach millions. This thesis explores three directions to address the complexity in time and memory of the problem: multi-label dimension reduction, optimization and implementation tricks, and tree-based methods. It proposes to unify the reduction approaches through a typology and two generic formulations and to identify the most efficient ones with an original meta-analysis of the results of the literature. A new approach is developed to analyze the interest of coupling the reduction problem and the classification problem. To reduce the memory complexity of a classical one-vs-rest regression model while maintaining its predictive performances, we also propose an algorithm for estimating the largest useful parameters that follows a strategy inspired by data stream analysis. Finally, we present a new algorithm called CRAFTML that learns an ensemble of diversified decision trees. Each tree performs a joint random reduction of the feature and the label spaces and implements a very fast recursive partitioning strategy. CRAFTML performs better than other XML tree-based methods and is competitive with the most accurate methods that require supercomputers. The contributions of the thesis are completed by the presentation of a software called VIPE that is developed with Orange Labs for multi- label opinion analysis.
Document type :
Complete list of metadata

Cited literature [261 references]  Display  Hide  Download
Contributor : Wissam Siblini Connect in order to contact the contributor
Submitted on : Wednesday, February 6, 2019 - 11:28:13 PM
Last modification on : Wednesday, April 27, 2022 - 3:51:25 AM
Long-term archiving on: : Tuesday, May 7, 2019 - 3:19:18 PM


Files produced by the author(s)


  • HAL Id : tel-02010219, version 1


Wissam Siblini. Apprentissage multi-label extrême : Comparaisons d'approches et nouvelles propositions. Informatique [cs]. Université de Nantes, Ecole Polytechnique, 2018. Français. ⟨tel-02010219⟩



Record views


Files downloads