Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

Rohit Babbar; Ioannis Partalas; Eric Gaussier; Massih-Reza Amini

doi:10.1145/2600428.2609509

Communication Dans Un Congrès Année : 2014

Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

(1) , (1) , (1) , (1)

Rohit Babbar

Fonction : Auteur

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Ioannis Partalas

Fonction : Auteur

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Eric Gaussier

Fonction : Auteur
PersonId : 182833
IdHAL : eric-gaussier
ORCID : 0000-0002-8858-3233
IdRef : 074308297

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Résumé

For large-scale category systems, such as Directory Mozilla, which consist of tens of thousand categories, it has been empirically verified in earlier studies that the distribution of documents among categories can be modeled as a power-law distribution. It implies that a significant fraction of categories, referred to as rare categories, have very few doc-uments assigned to them. This characteristic of the data makes it harder for learning algorithms to learn effective de-cision boundaries which can correctly detect such categories in the test set. In this work, we exploit the distribution of documents among categories to (i) derive an upper bound on the accuracy of any classifier, and (ii) propose a ranking-based algorithm which aims to maximize this upper bound. The empirical evaluation on publicly available large-scale datasets demonstrate that the proposed method not only achieves higher accuracy but also much higher coverage of rare categories as compared to state-of-the-art methods.

Mots clés

Large-scale classification Power-law distribution

Domaines

Apprentissage [cs.LG] Recherche d'information [cs.IR]

Fichier principal

SIGIR2014.pdf (288.98 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Massih-Reza Amini : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01118830

Soumis le : mardi 24 février 2015-21:37:39

Dernière modification le : jeudi 4 avril 2024-18:23:17

Archivage à long terme le : mardi 26 mai 2015-17:35:35

Dates et versions

hal-01118830 , version 1 (24-02-2015)

Identifiants

HAL Id : hal-01118830 , version 1
DOI : 10.1145/2600428.2609509

Citer

Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini. Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems. ACM Special Interest Group on Information Retrieval (SIGIR 2014), Aug 2014, Gold Coast, Australia. pp.1059-1062, ⟨10.1145/2600428.2609509⟩. ⟨hal-01118830⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG PERSYVAL-LAB ANR LIG_SIDCH LIG_SIDCH_APTIKAL

283 Consultations

369 Téléchargements

Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager