Data quality in ETL process: A preliminary study

Abstract : The accuracy and relevance of Business Intelligence & Analytics (BI&A) rely on the ability to bring high data quality to the data warehouse from both internal and external sources using the ETL process. The latter is complex and time-consuming as it manages data with heterogeneous content and diverse quality problems. Ensuring data quality requires tracking quality defects along the ETL process. In this paper, we present the main ETL quality characteristics. We provide an overview of the existing ETL process data quality approaches. We also present a comparative study of some commercial ETL tools to show how much these tools consider data quality dimensions. To illustrate our study, we carry out experiments using an ETL dedicated solution (Talend Data Integration) and a data quality dedicated solution (Talend Data Quality). Based on our study, we identify and discuss quality challenges to be addressed in our future research.
Document type :
Conference papers
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02424279
Contributor : Samira Si-Said Cherfi <>
Submitted on : Thursday, January 9, 2020 - 10:38:47 AM
Last modification on : Thursday, February 6, 2020 - 10:26:11 AM

File

1-s2.0-S1877050919314097-main....
Publisher files allowed on an open archive

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

Collections

Citation

Manel Souibgui, Faten Atigui, Saloua Zammali, Samira Si-Said Cherfi, Sadok Ben Yahia. Data quality in ETL process: A preliminary study. 23rd International Conference KES-2019, Sep 2019, Budapest, Hungary. pp.676-687, ⟨10.1016/j.procs.2019.09.223⟩. ⟨hal-02424279⟩

Share

Metrics

Record views

26

Files downloads

11