Skip to Main content Skip to Navigation
Conference papers

Human-in-the-Loop Schema Inference for Massive JSON Datasets

Abstract : JSON established itself as a popular data format for representing data whose structure is irregular or unknown a priori. JSON collections are usually massive and schema-less. Inferring a schema describing the structure of these collections is crucial for formulating meaningful queries and for adopting schema-based optimizations. In a recent work, we proposed a Map/Reduce schema inference approach that either infers a compact representation of the input collection or a precise description of every possible shape in the data. Since no level of precision is ideal, it is more appealing to give the analyst the freedom of choosing between different levels of precisions in an interactive fashion. In this paper we describe a schema inference system offering this important functionality.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02560196
Contributor : Mohamed-Amine Baazizi Connect in order to contact the contributor
Submitted on : Friday, May 1, 2020 - 11:40:04 AM
Last modification on : Tuesday, March 23, 2021 - 9:28:02 AM

File

paper-318.pdf
Files produced by the author(s)

Identifiers

`

Citation

Mohamed-Amine Baazizi, Clément Berti, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani. Human-in-the-Loop Schema Inference for Massive JSON Datasets. EDBT 2020 - 23nd International Conference on Extending Database Technology, Mar 2020, Copenhagen, Denmark. pp.635-638, ⟨10.5441/002/edbt.2020.82⟩. ⟨hal-02560196⟩

Share

Metrics

Record views

98

Files downloads

180