Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Abstract : We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption, and predictive performance with respect to state-of-the-art approaches.
Type de document :
Communication dans un congrès
2017 Conference on Neural Information Processing Systems, Dec 2017, Long Beach, United States. 2018, Advances in Neural Information Processing Systems 30
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01769780
Contributeur : Massih-Reza Amini <>
Soumis le : mercredi 18 avril 2018 - 13:30:12
Dernière modification le : lundi 30 avril 2018 - 15:02:01

Identifiants

  • HAL Id : hal-01769780, version 1

Collections

Citation

Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler, Yury Maximov. Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification. 2017 Conference on Neural Information Processing Systems, Dec 2017, Long Beach, United States. 2018, Advances in Neural Information Processing Systems 30. 〈hal-01769780〉

Partager

Métriques

Consultations de la notice

187