KeyRanker: Automatic RDF Key Ranking for Data Linking
Résumé
Automatic approaches to key discovery on RDF datasets generate sets of discriminative properties that can be used to configure data linking systems relying on link specifications. These keys often come in large numbers, generated independently for two datasets to be linked, lacking an assessment of their usefulness for the linking task. We propose a novel generic algorithm for selecting keys, valid in two datasets, and ranking them with respect to their individual likelihood to generate identity links. In addition, we explore the combined use of several complementary keys improving their individual performance. We evaluate our approach on diverse synthetic and real-world benchmark data, showing its robustness with respect to different linking tools and domains.