Counting Types for Massive JSON Datasets - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Counting Types for Massive JSON Datasets

Mohamed-Amine Baazizi
Dario Colazzo
Giorgio Ghelli
  • Fonction : Auteur
  • PersonId : 1004291
Carlo Sartiani
  • Fonction : Auteur
  • PersonId : 1095200

Résumé

Type systems express structural information about data, are human readable and hence crucial for understanding code, and are endowed with a formal deenition that makes them a fundamental tool when proving program properties. Internal data structures of a database store quantitative information about data, information that is essential for optimization purposes, but is not used for documentation or for correctness proofs. In this paper we propose a new idea: raising a part of the quantitative information from the system-level structures to the type level. Our proposal is motivated by the problem of schema inference for massive collections of JSON data, which are nowadays ooen collected from external sources and stored in NoSQL systems without an a-priori schema, which makes a-posteriori schema inference extremely useful. NoSQL systems are oriented towards the management of heterogeneous data, and in this context we claim that quantitative information is important in order to assess the relative weight of diierent variants. We propose a type system where the same collection can be described at diierent levels of abstraction. Diierent abstraction levels are useful for diierent purposes, hence we describe a parametric inference mechanism, where a single parameter speciies the chosen trade-oo between succinctness and precision for the inferred type. is algorithm is designed for massive JSON collection, and hence admits a simple and eecient map-reduce implementation.
Fichier principal
Vignette du fichier
dbpl17.pdf (732.36 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01674279 , version 1 (02-01-2018)

Identifiants

Citer

Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani. Counting Types for Massive JSON Datasets . DBPL '17: Proceedings of The 16th International Symposium on Database Programming Languages, Sep 2017, Munich, Germany. pp.1-12, ⟨10.1145/3122831.3122837⟩. ⟨hal-01674279⟩
141 Consultations
341 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More