Abstract : The accuracy and relevance of Business Intelligence & Analytics (BI&A) rely on the ability to bring high data quality to the data warehouse from both internal and external sources using the ETL process. The latter is complex and time-consuming as it manages data with heterogeneous content and diverse quality problems. Ensuring data quality requires tracking quality defects along the ETL process. In this paper, we present the main ETL quality characteristics. We provide an overview of the existing ETL process data quality approaches. We also present a comparative study of some commercial ETL tools to show how much these tools consider data quality dimensions. To illustrate our study, we carry out experiments using an ETL dedicated solution (Talend Data Integration) and a data quality dedicated solution (Talend Data Quality). Based on our study, we identify and discuss quality challenges to be addressed in our future research.
https://hal.archives-ouvertes.fr/hal-02424279 Contributor : Samira Si-Said CherfiConnect in order to contact the contributor Submitted on : Thursday, January 9, 2020 - 10:38:47 AM Last modification on : Sunday, April 3, 2022 - 6:18:02 PM Long-term archiving on: : Saturday, April 11, 2020 - 12:09:45 PM
Manel Souibgui, Faten Atigui, Saloua Zammali, Samira Si-Said Cherfi, Sadok Ben Yahia. Data quality in ETL process: A preliminary study. 23rd International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Sep 2019, Budapest, Hungary. pp.676-687, ⟨10.1016/j.procs.2019.09.223⟩. ⟨hal-02424279⟩