Filtering and clustering relations for unsupervised information extraction in open domain - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Filtering and clustering relations for unsupervised information extraction in open domain

Résumé

Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the information extracted and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing new relations between a given set of entity types. One of the challenges of this task is to deal with the large amount of candidate relations when extracting them from a large corpus. We propose in this paper an approach for the filtering of such candidate relations based on heuristics and machine learning models. More precisely, we show that the best model for achieving this task is a Conditional Random Field model, according to evaluations performed on a manually annotated corpus of about one thousand relations. We also tackle the problem of identifying semantically similar relations by clustering large sets of them. Such clustering is achieved by combining a classical clustering algorithm and a method for the efficient identification of pairs of highly similar relations. Finally, we evaluate the impact of our filtering of relations on this semantic clustering with both internal measures and external measures. Results show that the filtering procedure doubles the recall of the clustering while keeping the same precision.
Fichier principal
Vignette du fichier
cikm0874-wang.pdf (368.21 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02282051 , version 1 (09-09-2019)

Identifiants

Citer

W Wang, Romaric Besançon, Olivier Ferret, Brigitte Grau. Filtering and clustering relations for unsupervised information extraction in open domain. ACM international Conference on Information and Knowledge Management (CIKM 2011), Jan 2011, Glasgow, United Kingdom. ⟨10.1145/2063576.2063780⟩. ⟨hal-02282051⟩
38 Consultations
278 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More