A Content-Driven ETL Processes for Open Data (ADBIS 2014) - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2015

A Content-Driven ETL Processes for Open Data (ADBIS 2014)

Résumé

The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with well structured schemas. We propose in this paper a content-driven ETL processes which automates ''as far as possible'' the extraction phase based only on the content of flat Open Data sources. Our processes rely on data annotations and data mining techniques to discover hierarchical relationships. Processed data are then transformed into instance-schema graphs to facilitate the structural data integration and the definition of the multidimensional schemas of the data warehouse.
Fichier non déposé

Dates et versions

hal-03190204 , version 1 (06-04-2021)

Identifiants

Citer

Alain Berro, Imen Megdiche, Olivier Teste. A Content-Driven ETL Processes for Open Data (ADBIS 2014). Bassiliades, Nick; Ivanovic, Mirjana; Kon-Popovska, Margita; Palpanas, Themis; Trajcevski, Goce. New Trends in Database and Information Systems II : Selected papers of the 18th East European Conference on Advances in Databases and Information Systems and Associated Satellite Events, ADBIS 2014 Ohrid, Macedonia, September 7-10, 2014 - Proceedings II, 312 (II), Springer-Verlag, pp.29--40, 2015, Advances in Intelligent Systems and Computing book series (AISC), 978-3-319-10518-5. ⟨10.1007/978-3-319-10518-5_3⟩. ⟨hal-03190204⟩
31 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More