Robust Bloom Filters for Large MultiLabel Classification Tasks

Abstract : This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approach is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.
Type de document :
Communication dans un congrès
Advances in Neural Information Processing Systems 26, Dec 2013, Lake Tahoe, United States. pp.1851-1859, 2013
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00942742
Contributeur : Nicolas Usunier <>
Soumis le : jeudi 6 février 2014 - 13:31:16
Dernière modification le : samedi 1 décembre 2018 - 01:25:33
Document(s) archivé(s) le : mardi 6 mai 2014 - 23:05:20

Fichier

5083-robust-bloom-filters-for-...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00942742, version 1

Citation

Moustapha Cisse, Nicolas Usunier, Thierry Artières, Patrick Gallinari. Robust Bloom Filters for Large MultiLabel Classification Tasks. Advances in Neural Information Processing Systems 26, Dec 2013, Lake Tahoe, United States. pp.1851-1859, 2013. 〈hal-00942742〉

Partager

Métriques

Consultations de la notice

403

Téléchargements de fichiers

115