Chaudron: Extending DBpedia with measurement

Abstract : Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia Information Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework offers a limited support to handle measurement in Wikipedia. In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the traditional mapping creation from Wikipedia dump, by also using the rendered HTML to avoid the template transclusion issue. This dataset extends DBpedia with more than 3.9 million triples and 949.000 measurements on every domain covered by DBpedia. We define a multi-level approach powered by a formal grammar that proves very robust on the extraction of measurement. An extensive evaluation against DBpedia and Wikidata shows that our approach largely surpasses its competitors for measurement extraction on Wikipedia Infoboxes. Chaudron exhibits a F1-score of .89 while DBpedia and Wikidata respectively reach 0.38 and 0.10 on this extraction task.
Type de document :
Communication dans un congrès
14th European Semantic Web Conference, May 2017, Portoroz, Slovenia. 14th ESWC proceedings, 2017, 〈http://2017.eswc-conferences.org/〉
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01477214
Contributeur : Julien Subercaze <>
Soumis le : lundi 27 février 2017 - 10:42:03
Dernière modification le : jeudi 26 juillet 2018 - 01:11:08
Document(s) archivé(s) le : dimanche 28 mai 2017 - 12:46:28

Fichier

document.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01477214, version 1

Citation

Julien Subercaze. Chaudron: Extending DBpedia with measurement. 14th European Semantic Web Conference, May 2017, Portoroz, Slovenia. 14th ESWC proceedings, 2017, 〈http://2017.eswc-conferences.org/〉. 〈hal-01477214〉

Partager

Métriques

Consultations de la notice

435

Téléchargements de fichiers

320