Discovering and merging related analytic datasets - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Information Systems Année : 2020

Discovering and merging related analytic datasets

Eric Simon
  • Fonction : Auteur
  • PersonId : 953249
Bernd Amann
Stéphane Gançarski

Résumé

The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, despite the profusion of available datasets, it remains quite difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This article describes a model and algorithms that exploit automatically extracted and user-defined semantic relationships for extending analytic datasets with new atomic or aggregated attribute values. Our framework is implemented as a REST service in the SAP HANA and includes a careful analysis and practical solutions for several complex data quality issues.
Fichier principal
Vignette du fichier
S0306437920300065.pdf (1.29 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02459098 , version 1 (21-07-2022)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

Citer

Rutian Liu, Eric Simon, Bernd Amann, Stéphane Gançarski. Discovering and merging related analytic datasets. Information Systems, inPress, 91, pp.101495. ⟨10.1016/j.is.2020.101495⟩. ⟨hal-02459098⟩
161 Consultations
23 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More