ETL-Text: Extract-Transform-Load Processes for Textual Data Warehousing - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

ETL-Text: Extract-Transform-Load Processes for Textual Data Warehousing

Résumé

The construction of the ETL (Extract-Transform-Load) process is one of the biggest tasks of building a warehouse. ETL processes area has little research, because of its difficulty and lack of formal model for representing ETL activities that map the incoming data from different sources to be in a suitable format for loading into the warehouse. A main problem in data warehousing of multidimensional text databases is to deal with the content in its text cells. In this paper, we propose a model for textual data warehouse ETL processes called ETL-Text. It combines classical data warehousing tasks, information retrieval (IR) techniques, and information processing in particular the language modeling. Our approach is based on Wikipedia as external knowledge source to extract the semantics of the textual documents. To validate our approach, we develop a prototype composed of several processing modules that illustrate the different ETL-Text processes. Also, we use the 20 Newsgroups corpus to perform our experimentation.
Fichier non déposé

Dates et versions

hal-00911861 , version 1 (30-11-2013)

Identifiants

  • HAL Id : hal-00911861 , version 1

Citer

Rachid Aknouche, Ounas Asfari, Fadila Bentayeb, Omar Boussaid. ETL-Text: Extract-Transform-Load Processes for Textual Data Warehousing. EPIA 2013 (16th Portuguese Conference on Artificial Intelligence), Sep 2013, Azores, Portugal. pp.308-319. ⟨hal-00911861⟩
576 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More