Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing

Faraz Zaidi 1, 2 Arnaud Sallaberry 1, 2, 3 Guy Melançon 1, 2
2 GRAVITE - Graph Visualization and Interactive Exploration
Université Sciences et Technologies - Bordeaux 1, Inria Bordeaux - Sud-Ouest, École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : The emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia (http://en.wikipedia.org/wiki/Main_Page).
Document type :
Conference papers
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00425144
Contributor : Faraz Zaidi <>
Submitted on : Wednesday, October 21, 2009 - 11:51:48 AM
Last modification on : Thursday, January 11, 2018 - 6:22:12 AM
Long-term archiving on : Tuesday, June 15, 2010 - 9:35:32 PM

File

zaidi09.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00425144, version 1

Citation

Faraz Zaidi, Arnaud Sallaberry, Guy Melançon. Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing. Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2009, Milano, Italy. pp.198-205. ⟨hal-00425144⟩

Share

Metrics

Record views

348

Files downloads

557