Semantic sampling of existing databases through informative Armstrong databases

Abstract : Functional dependencies (FDs) and inclusion dependencies (INDs) convey most of data semantics in relational databases and are very useful in practice since they generalize keys and foreign keys. Nevertheless, FDs and INDs are often not available, obsolete or lost in real-life databases. Several algorithms have been proposed for mining these dependencies, but the output is always in the same format: a simple list of dependencies, hard to understand for the user. In this paper, we define informative Armstrong databases (IADBs) from databases as being small subsets of an existing database, satisfying exactly the same FDs and INDs. They are an extension of the classical notion of Armstrong databases, but more suitable for the understanding of dependencies, since tuples are real-world tuples. The main result of this paper is to bound the size of an IADB in the case of non-circular INDs. A constructive proof of this result is given, from which an algorithm has been devised. An implementation and experiments against a real-life database were performed; the obtained database contains 0.6% of the initial database tuples only. More importantly, such semantic sampling of databases appear to be a key feature for the understanding of existing databases at the logical level.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01501490
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Tuesday, April 4, 2017 - 12:18:39 PM
Last modification on : Thursday, February 7, 2019 - 4:21:39 PM

Identifiers

  • HAL Id : hal-01501490, version 1

Citation

Fabien de Marchi, Jean-Marc Petit. Semantic sampling of existing databases through informative Armstrong databases. Information Systems, Elsevier, 2007, 3, 32, pp.446-457. ⟨hal-01501490⟩

Share

Metrics

Record views

116