Random Web Crawls - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Random Web Crawls

Résumé

This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of course a Web crawl has a very special structure; we recall some known results about it. We then propose a model generating similar structures. Our model simply simulates a crawling, i.e. builds and crawls the graph at the same time. The graphs generated have lot of known properties of Web crawls. Our model is simpler than most random Web graph models, but captures the same properties. Notice that it models the crawling process instead of the page writing process of Web graph models.
Fichier principal
Vignette du fichier
W3.pdf (779.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00159620 , version 1 (03-07-2007)

Identifiants

  • HAL Id : hal-00159620 , version 1

Citer

Toufik Bennouas, Fabien de Montgolfier. Random Web Crawls. 16th international conference on World Wide Web, WWW 2007, 2007, Banff, Canada. pp.451-460. ⟨hal-00159620⟩
63 Consultations
164 Téléchargements

Partager

Gmail Facebook X LinkedIn More