A Multilingual Approach to Discover Cross-Language Links in Wikipedia - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

A Multilingual Approach to Discover Cross-Language Links in Wikipedia

Résumé

Wikipedia is a well-known public and collaborative ency- clopaedia consisting of millions of articles. Initially in English, the pop- ular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications are the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the mul- tilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our eval- uation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.
Fichier principal
Vignette du fichier
wise.pdf (1.8 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01276205 , version 1 (11-04-2018)

Identifiants

Citer

Nacéra Bennacer Seghouani, Mia Johnson Vioulès, Ariel López Maximiliano, Gianluca Quercini. A Multilingual Approach to Discover Cross-Language Links in Wikipedia. 16th International Conference Web Information Systems Engineering (WISE), Nov 2015, Miami, United States. ⟨10.1007/978-3-319-26190-4_36⟩. ⟨hal-01276205⟩
530 Consultations
81 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More