Skip to Main content Skip to Navigation
Conference papers

Block clustering for web pages categorization

Abstract : With the growth of web-based applications and the increased popularity of the World Wide Web (WWW), the WWW became the greatest source of information available in the world leading to an increased difficulty of extracting relevant information. Moreover, the content of web sites is constantly changing leading to continual changes in Web users? behaviours. Therefore, there is significant interest in analysing web content data to better serve users. Our proposed approach, which is grounded on automatic textual analysis of a web site independently from the usage attempts to define groups of documents dealing with the same topic. Both document clustering and word clustering are well studied problems. However, most existing algorithms cluster documents and words separately but not simultaneously. In this paper, we propose to apply a block clustering algorithm to categorize a web site pages according to their content. We report results of our recent testing of CROKI2 algorithm on a tourist web site.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01125843
Contributor : Laboratoire Cedric <>
Submitted on : Friday, March 6, 2015 - 11:30:18 AM
Last modification on : Wednesday, May 13, 2020 - 1:38:20 AM

Links full text

Identifiers

Collections

Citation

Malika Charrad, Yves Lechevallier, Mohamed Ben Ahmed, Gilbert Saporta. Block clustering for web pages categorization. IDEAL 2009: Intelligent Data Engineering and Automated Learning, Sep 2009, Burgos, Spain. pp.260-267, ⟨10.1007/978-3-642-04394-9_32⟩. ⟨hal-01125843⟩

Share

Metrics

Record views

296