XML content warehousing: Improving sociological studies of mailing lists and web data

Benjamin Nguyen 1, 2 Antoine Vion 3 François-Xavier Dudouet 4 Dario Colazzo 5, 6 Ioana Manolescu 5, 6 Pierre Senellart 7
2 SMIS - Secured and Mobile Information Systems
PRISM - Parallélisme, Réseaux, Systèmes, Modélisation, UVSQ - Université de Versailles Saint-Quentin-en-Yvelines, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8144
6 LEO - Distributed and heterogeneous data and knowledge
UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00616613
Contributor : Benjamin Nguyen <>
Submitted on : Tuesday, August 23, 2011 - 2:40:09 PM
Last modification on : Wednesday, July 25, 2018 - 1:24:37 AM
Long-term archiving on : Friday, November 25, 2011 - 12:01:13 PM

File

bms2011.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Benjamin Nguyen, Antoine Vion, François-Xavier Dudouet, Dario Colazzo, Ioana Manolescu, et al.. XML content warehousing: Improving sociological studies of mailing lists and web data. Bulletin de Méthodologie Sociologique / Bulletin of Sociological Methodology, SAGE Publications, 2011, 112 (1), pp.5-31. ⟨10.1177/0759106311417540⟩. ⟨hal-00616613⟩

Share

Metrics

Record views

1026

Files downloads

362