4461 articles – 13151 references  [version française]
HAL: hal-00616613, version 1

Detailed view  Export this paper
Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 112, 1 (2011) 5-31
XML content warehousing: Improving sociological studies of mailing lists and web data
Benjamin Nguyen 1, 2, Antoine Vion 3, François-Xavier Dudouet 4, Dario Colazzo 5, 6, Ioana Manolescu 5, 6, Pierre Senellart 7
For the WebStand collaboration(s)
(2011-10)

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis.
1:  Parallélisme, Réseaux, Systèmes d'information, Modélisation (PRISM)
CNRS : UMR8144 – Université de Versailles Saint-Quentin-en-Yvelines
2:  SMIS (INRIA Rocquencourt)
INRIA – CNRS : UMR8144 – Université de Versailles Saint-Quentin-en-Yvelines
3:  Laboratoire d'économie et de sociologie du travail (LEST)
CNRS : UMR6123 – Université de Provence - Aix-Marseille I – Université de la Méditerranée - Aix-Marseille II
4:  Institut de recherche interdisciplinaire en sociologie, économie, science politique (IRISES)
Université Paris IX - Paris Dauphine – CNRS : UMR7170
5:  Laboratoire de Recherche en Informatique (LRI)
CNRS : UMR8623 – Université Paris XI - Paris Sud
6:  LEO (INRIA Saclay - Ile de France)
INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
7:  Institut Télécom - Télécom ParisTech
Télécom ParisTech
Computer Science/Databases

Humanities and Social Sciences/Sociology
XML – Web Data Management – Mailing List Analysis
Attached file list to this document: 
PDF
bms2011.pdf(312.1 KB)