On data lake architectures and metadata management - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Intelligent Information Systems Année : 2021

On data lake architectures and metadata management

Résumé

Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge traditional data management and analysis systems. The concept of data lake was introduced to address them. A data lake is a large, raw data repository that stores and manages all company data bearing any format. However, the data lake concept remains ambiguous or fuzzy for many researchers and practitioners, who often confuse it with the Hadoop technology. Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. We particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes. We also discuss the pros and cons of data lakes and their design alternatives.
Fichier principal
Vignette du fichier
manuscript.pdf (1.33 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03114365 , version 1 (22-07-2021)

Licence

Paternité

Identifiants

Citer

Pegdwendé Nicolas Sawadogo, Jérôme Darmont. On data lake architectures and metadata management. Journal of Intelligent Information Systems, 2021, 56 (1), pp.97-120. ⟨10.1007/s10844-020-00608-7⟩. ⟨hal-03114365⟩
174 Consultations
1191 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More