Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph

Stefano Faralli
  • Fonction : Auteur
  • PersonId : 1070004
Fleur Mougin
Paul Buitelaar
  • Fonction : Auteur
  • PersonId : 1070005
Gayo Diallo

Résumé

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state-of-the-art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.
Fichier principal
Vignette du fichier
LREC_2020.pdf (1.06 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02614678 , version 1 (21-05-2020)

Identifiants

  • HAL Id : hal-02614678 , version 1

Citer

Georgeta Bordea, Stefano Faralli, Fleur Mougin, Paul Buitelaar, Gayo Diallo. Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph. LREC'2020, May 2020, Marseille, France. ⟨hal-02614678⟩

Collections

U1219
103 Consultations
49 Téléchargements

Partager

Gmail Facebook X LinkedIn More