Extracting hypernym relations from Wikipedia disambiguation pages: comparing symbolic and machine learning approaches

Abstract : Extracting hypernym relations from text is one of the key steps in the construction and enrichment of semantic resources. Several methods have been exploited in a variety of propositions in the literature. However, the strengths of each approach on a same corpus are still poorly identified in order to better take advantage of their complementarity. In this paper, we study how complementary two approaches of different nature are when identifying hypernym relations on a structured corpus containing both well-written text and syntactically poor formulations, together with a rich formatting. A symbolic approach based on lexico-syntactic patterns and a statistical approach using a supervised learning method are applied to a sub-corpus of Wikipedia in French, composed of disambiguation pages. These pages, particularly rich in hypernym relations, contain both kinds of formulations. We compared the results of each approach independently of each other and compared the performance when combining together their individual results. We obtain the best results in the latter case, with an F-measure of 0.75. In addition, 55% of the identified relations, with respect to a reference corpus, are not expressed in the French DBPedia and could be used to enrich this resource.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02074880
Contributor : Cécile Fabre <>
Submitted on : Thursday, March 21, 2019 - 7:54:53 AM
Last modification on : Thursday, October 17, 2019 - 8:54:34 AM

Identifiers

  • HAL Id : hal-02074880, version 1

Collections

Citation

Mouna Kamel, Cássia Trojahn, Adel Ghamnia, Nathalie Aussenac- Gilles, Cécile Fabre. Extracting hypernym relations from Wikipedia disambiguation pages: comparing symbolic and machine learning approaches. International Conference on Computational Semantics (IWCS), Sep 2017, Montpellier, France. ⟨hal-02074880⟩

Share

Metrics

Record views

47