On power law distributions in large-scale taxonomies

Abstract : In many of the large-scale physical and social complex systems phenomena fat-tailed distributions occur, for which different generating mechanisms have been proposed. In this paper, we study models of generating power law distributions in the evolution of large-scale taxonomies such as Open Directory Project, which consist of websites assigned to one of tens of thousands of categories. The categories in such taxonomies are arranged in tree or DAG structured configurations having parent-child relations among them. We first quantitatively analyse the formation process of such taxonomies, which leads to power law distribution as the stationary distributions. In the context of designing classifiers for large-scale taxonomies, which automatically assign unseen documents to leaf-level categories, we highlight how the fat-tailed nature of these distributions can be leveraged to analytically study the space complexity of such clas-sifiers. Empirical evaluation of the space complexity on publicly available datasets demonstrates the applicability of our approach.
Type de document :
Article dans une revue
SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, Association for Computing Machinery (ACM), 2014, 16 (1), pp.47-56. 〈http://www.kdd.org/newsletter/explorations-june-2014-16-1〉. 〈10.1145/2674026.2674033〉
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01120164
Contributeur : Massih-Reza Amini <>
Soumis le : mardi 24 février 2015 - 21:48:41
Dernière modification le : jeudi 11 octobre 2018 - 08:48:04
Document(s) archivé(s) le : mercredi 27 mai 2015 - 10:52:27

Fichier

sigkddExp.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Rohit Babbar, Cornelia Metzig, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini. On power law distributions in large-scale taxonomies. SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, Association for Computing Machinery (ACM), 2014, 16 (1), pp.47-56. 〈http://www.kdd.org/newsletter/explorations-june-2014-16-1〉. 〈10.1145/2674026.2674033〉. 〈hal-01120164〉

Partager

Métriques

Consultations de la notice

342

Téléchargements de fichiers

143