Integration Process for Multidimensional Textual Data Modeling - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Integration Process for Multidimensional Textual Data Modeling

Résumé

In this paper, we propose an original approach for text warehousing process. It is based on a decisional architecture which combines classical data warehousing tasks and information retrieval (IR) techniques. We first propose a new ETL process, named ETL-Text, for textual data integration and then, we present a new Text Warehouse Model, denoted TWM, which takes into account both the structure and the semantics of the textual data. TWM is associated with new dimensions types including: a metadata dimension and a semantic dimension. In addition, we propose a new analysis measure based on the language model widely used in IR area. Moreover, our approach is based on Wikipedia as external knowledge source to extract the semantics of the textual documents. To validate our approach, we develop a prototype composed of several processing modules that illustrate the different steps of the ETL-Text. Also, we use the 20 Newsgroups corpus to perform our experimentation.
Fichier non déposé

Dates et versions

hal-00911862 , version 1 (30-11-2013)

Identifiants

  • HAL Id : hal-00911862 , version 1

Citer

Rachid Aknouche, Ounas Asfari, Fadila Bentayeb, Omar Boussaid. Integration Process for Multidimensional Textual Data Modeling. 1st International Workshop in Software Evolution and Modernization SEM / ENASE 2013, Jul 2013, Angers, France. pp.119-126. ⟨hal-00911862⟩
172 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More