Hierarchical clustering for property graph schema discovery - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Hierarchical clustering for property graph schema discovery

Résumé

The property graph model is becoming increasingly popular among users and is currently employed by several open-source and commercial graph database systems. Although property graphs are widely adopted, there is a lack of understanding of their underlying schema structure. In particular, the schema discovery problem consists of extracting the schema concepts from a property graph. A property graph schema helps build a concise description of the data it represents, to make it more digestible for humans and interactive processes, as well as usable for query optimization purposes. In this paper, we address the property graph schema discovery problem and introduce the GMMSchema method based on hierarchical clustering using a Gaussian Mixture Model, which accounts for both label and property information on nodes. We experimentally analyze the accuracy and performance of GMMSchema, compared to those of its closest competitor, and showcase its superiority on several commonly used datasets, including real-world ones, such as the Covid19 knowledge graph, as well as the Fib25 and Mb6 NeuPrint graphs.
Fichier principal
Vignette du fichier
EDBTShort_2022.pdf (936.1 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03692293 , version 1 (09-06-2022)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

  • HAL Id : hal-03692293 , version 1

Citer

Angela Bonifati, Stefania Dumbrava, Nicolas Mir. Hierarchical clustering for property graph schema discovery. 25th International Conference on Extending Database Technology (EDBT ), Mar 2022, Edinburgh, United Kingdom. pp.449-453. ⟨hal-03692293⟩
124 Consultations
146 Téléchargements

Partager

Gmail Facebook X LinkedIn More