Skip to Main content Skip to Navigation
Conference papers

Robust Bloom Filters for Large MultiLabel Classification Tasks

Abstract : This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approach is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Nicolas Usunier <>
Submitted on : Thursday, February 6, 2014 - 1:31:16 PM
Last modification on : Thursday, September 19, 2019 - 2:20:04 PM
Document(s) archivé(s) le : Tuesday, May 6, 2014 - 11:05:20 PM


Publisher files allowed on an open archive


  • HAL Id : hal-00942742, version 1


Moustapha Cisse, Nicolas Usunier, Thierry Artières, Patrick Gallinari. Robust Bloom Filters for Large MultiLabel Classification Tasks. Advances in Neural Information Processing Systems 26, Dec 2013, Lake Tahoe, United States. pp.1851-1859. ⟨hal-00942742⟩



Record views


Files downloads