Skip to Main content Skip to Navigation
Conference papers

Scaling Up Schema Discovery for RDF Datasets

Abstract : An increasing number of data sources is published on the Web, expressed using the languages proposed by the W3C such as RDF. In these sources, data is not constrained by a schema: data could differ from the schema-related statements provided in the source; furthermore, the schema could be incomplete or even missing; this makes the use of the data sources difficult. Some works have addressed the problem of automatic schema discovery but their scalability and their use in a big data context remain a challenge. In this work, we address this scalability issue, which is mainly related to the clustering algorithms at the core of schema discovery. In order to process large amounts of data, we propose to build a condensed representation of the initial dataset by extracting patterns representing all the existing combinations of properties. The clustering is then performed on the patterns instead of the initial dataset. In this paper, we describe our approach, and present its implementation using a big data technology. We also present some experimental evaluations performed on real datasets.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01757028
Contributor : Stéphane Lopes <>
Submitted on : Tuesday, October 9, 2018 - 10:58:14 AM
Last modification on : Friday, March 6, 2020 - 11:42:02 AM
Long-term archiving on: : Thursday, January 10, 2019 - 1:21:26 PM

File

Scaling Up Schema Discovery fo...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01757028, version 1

Collections

Citation

Redouane Bouhamoum, Kenza Kellou-Menouer, Zoubida Kedad, Stéphane Lopes. Scaling Up Schema Discovery for RDF Datasets. Data Engineering meets the Semantic Web (DESWeb'2018), Apr 2018, Paris, France. ⟨hal-01757028⟩

Share

Metrics

Record views

187

Files downloads

225