Metadata Systems for Data Lakes: Models and Features

Abstract : Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on a metadata system that must be efficient and comprehensive. However, metadata management in data lakes remains a current issue and the criteria for evaluating its effectiveness are more or less nonexistent. In this paper, we introduce MEDAL, a generic, graph-based model for metadata management in data lakes. We also propose evaluation criteria for data lake metadata systems through a list of expected features. Eventually, we show that our approach is more comprehensive than existing metadata systems.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02157195
Contributor : Jérôme Darmont <>
Submitted on : Saturday, June 15, 2019 - 4:40:13 PM
Last modification on : Sunday, June 16, 2019 - 1:46:57 AM

Identifiers

  • HAL Id : hal-02157195, version 1

Collections

Citation

Pegdwendé Sawadogo, Etienne Scholly, Cécile Favre, Eric Ferey, Sabine Loudcher, et al.. Metadata Systems for Data Lakes: Models and Features. 1st International Workshop on BI and Big Data Applications (BBIGAP@ADBIS 2019), Sep 2019, Bled, Slovenia. ⟨hal-02157195⟩

Share

Metrics

Record views

83