Skip to Main content Skip to Navigation
Conference papers

HInT: Hybrid and Incremental Type Discovery for Large RDF Data Sources

Abstract : The rapid explosion of linked data has resulted into many weakly structured and incomplete data sources, where typing information might be missing. On the other hand, type information is essential for a number of tasks such as query answering, integration, summarization and partitioning. Existing approaches for type discovery, either completely ignore type declarations available in the dataset (implicit type discovery approaches), or rely only on existing types, in order to complement them (explicit type enrichment approaches). Implicit type discovery approaches are based on instance grouping, which requires an exhaustive comparison between the instances. This process is expensive and not incremental. Explicit type enrichment approaches on the other hand, are not able to identify new types and they can not process data sources that have little or no schema information. In this paper, we present HInT, the first incremental and hybrid type discovery system for RDF datasets, enabling type discovery in datasets where type declarations are missing. To achieve this goal, we incrementally identify the patterns of the various instances, we index and then group them to identify the types. During the processing of an instance, our approach exploits its type information, if available, to improve the quality of the discovered types by guiding the classification of the new instance in the correct group and by refining the groups already built. We analytically and experimentally show that our approach dominates in terms of efficiency, competitors from both worlds, implicit type discovery and explicit type enrichment while outperforming them in most of the cases in terms of quality.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03329877
Contributor : Équipe HAL UVSQ Connect in order to contact the contributor
Submitted on : Tuesday, August 31, 2021 - 1:02:03 PM
Last modification on : Wednesday, October 20, 2021 - 12:24:55 AM

Identifiers

Citation

Nikolaos Kardoulakis, Kenza Kellou-Menouer, Georgia Troullinou, Zoubida Kedad, Dimitris Plexousakis, et al.. HInT: Hybrid and Incremental Type Discovery for Large RDF Data Sources. 33rd International Conference on Scientific and Statistical Database Management, SSDBM 2021, Jul 2021, Tampa (FL), United States. pp.97-108, ⟨10.1145/3468791.3468808⟩. ⟨hal-03329877⟩

Share

Metrics

Record views

22