Extracting Linked Data from statistic spreadsheets - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Extracting Linked Data from statistic spreadsheets

Résumé

Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. In this work, we provide a conceptual model for the open data comprised in statistics published by INSEE, the national French economic and societal statistics institute. Then, we describe a novel method for extracting RDF Linked Open Data, to populate an instance of this model. We used our method to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction. Further, we also present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.
Fichier principal
Vignette du fichier
paper.pdf (654.19 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01915148 , version 1 (07-11-2018)

Identifiants

  • HAL Id : hal-01915148 , version 1

Citer

Tien-Duc Cao, Ioana Manolescu, Xavier Tannier. Extracting Linked Data from statistic spreadsheets. Conférence sur la Gestion de Données – Principes, Technologies et Applications, Oct 2018, Bucarest, Romania. ⟨hal-01915148⟩
218 Consultations
367 Téléchargements

Partager

Gmail Facebook X LinkedIn More