Random Web Crawls

Toufik Bennouas; Fabien de Montgolfier

Communication Dans Un Congrès Année : 2007

Random Web Crawls

(1) , (1)

Toufik Bennouas

Fonction : Auteur
PersonId : 841067

Laboratoire d'informatique Algorithmique : Fondements et Applications

Fabien de Montgolfier

Fonction : Auteur
PersonId : 949013
ORCID : 0000-0002-3237-4256

Laboratoire d'informatique Algorithmique : Fondements et Applications

Résumé

This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of course a Web crawl has a very special structure; we recall some known results about it. We then propose a model generating similar structures. Our model simply simulates a crawling, i.e. builds and crawls the graph at the same time. The graphs generated have lot of known properties of Web crawls. Our model is simpler than most random Web graph models, but captures the same properties. Notice that it models the crawling process instead of the page writing process of Web graph models.

Mots clés

web graph crawling crawl order model hyperlink structure

Domaines

Algorithme et structure de données [cs.DS] Web

Fichier principal

W3.pdf (779.49 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabien de Montgolfier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00159620

Soumis le : mardi 3 juillet 2007-16:24:01

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : jeudi 8 avril 2010-22:28:15

Dates et versions

hal-00159620 , version 1 (03-07-2007)

Identifiants

HAL Id : hal-00159620 , version 1

Citer

Toufik Bennouas, Fabien de Montgolfier. Random Web Crawls. 16th international conference on World Wide Web, WWW 2007, 2007, Banff, Canada. pp.451-460. ⟨hal-00159620⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 CNRS LIAFA

63 Consultations

164 Téléchargements

Random Web Crawls

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager