HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

Abstract : For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power-law distribution. It implies that a significant fraction of categories, referred to as rare categories, have very few doc-uments assigned to them. This characteristic of the data makes it harder for learning algorithms to learn effective de-cision boundaries which can correctly detect such categories in the test set. In this work, we exploit the distribution of documents among categories to (i) derive an upper bound on the accuracy of any classifier, and (ii) propose a ranking-based algorithm which aims to maximize this upper bound. The empirical evaluation on publicly available large-scale datasets demonstrate that the proposed method not only achieves higher accuracy but also much higher coverage of rare categories as compared to state-of-the-art methods.
Complete list of metadata

Cited literature [7 references]  Display  Hide  Download

Contributor : Massih-Reza Amini Connect in order to contact the contributor
Submitted on : Tuesday, February 24, 2015 - 9:37:39 PM
Last modification on : Thursday, October 21, 2021 - 3:48:43 AM
Long-term archiving on: : Tuesday, May 26, 2015 - 5:35:35 PM


Files produced by the author(s)



Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini. Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems. ACM Special Interest Group on Information Retrieval (SIGIR 2014), Aug 2014, Gold Coast, Australia. pp.1059-1062, ⟨10.1145/2600428.2609509⟩. ⟨hal-01118830⟩



Record views


Files downloads