Pagerank based clustering of hypertext document collections

Konstantin Avrachenkov; Vladimir Dobrynin; Danil Nemirovsky; Son Kim Pham; Elena Smirnova

doi:10.1145/1390334.1390549

Communication Dans Un Congrès Année : 2008

Pagerank based clustering of hypertext document collections

(1) , (2) , (1) , (3) , (2)

1
2
3

Konstantin Avrachenkov

Fonction : Auteur
PersonId : 11963
IdHAL : konstantin-avrachenkov
ORCID : 0000-0002-8124-8272
IdRef : 087245280

Models for the performance analysis and the control of networks

Vladimir Dobrynin

Fonction : Auteur

St Petersburg State University

Danil Nemirovsky

Fonction : Auteur

Models for the performance analysis and the control of networks

Son Kim Pham

Fonction : Auteur

University of California [San Diego]

Elena Smirnova

Fonction : Auteur

St Petersburg State University

Résumé

Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.

Domaines

Réseaux et télécommunications [cs.NI]

Konstantin Avrachenkov : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00565355

Soumis le : vendredi 11 février 2011-18:43:34

Dernière modification le : jeudi 11 janvier 2024-11:24:05

Dates et versions

inria-00565355 , version 1 (11-02-2011)

Identifiants

HAL Id : inria-00565355 , version 1
DOI : 10.1145/1390334.1390549

Citer

Konstantin Avrachenkov, Vladimir Dobrynin, Danil Nemirovsky, Son Kim Pham, Elena Smirnova. Pagerank based clustering of hypertext document collections. International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul 2008, Singapore, Singapore. pp.873--874, ⟨10.1145/1390334.1390549⟩. ⟨inria-00565355⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2

80 Consultations

0 Téléchargements

Pagerank based clustering of hypertext document collections

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager