Création rapide et efficace d'un système de désambiguïsation lexicale pour une langue peu dotée

Abstract : We introduce a method to quickly build a Word Sense Disambiguation (WSD) system for a lesser-resourced language L, under the condition that a Statistical Machine Transation system (SMT) is available from a well resourced language where semantically annotated corpora are available (here, English) towards L. We argue that it is less difficult to obtain the resources mandatory for the development of an SMT system (parallel-corpora) than it is to create the resources necessary for a WSD system (semantically annotated corpora, lexical resources). In the present work, we propose to translate a semantically annotated corpus from English to L and then to create a WSD system for L following the classical supervi- sed WSD paradigm. We demonstrate the feasibility and genericity of our proposed method by translating SemCor from English to Bangla and from English to French. SemCor is an English corpus annotated with Princeton WordNet sense tags. We show the feasibility of the approach using the Multilingual WSD task from Semeval 2013.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01856098
Contributor : Didier Schwab <>
Submitted on : Thursday, August 9, 2018 - 4:36:22 PM
Last modification on : Monday, February 11, 2019 - 4:36:02 PM
Document(s) archivé(s) le : Saturday, November 10, 2018 - 1:21:11 PM

File

NTBS-taln2015.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01856098, version 1

Collections

Citation

Mohammad Nasiruddin, Andon Tchechmedjiev, Hervé Blanchon, Didier Schwab. Création rapide et efficace d'un système de désambiguïsation lexicale pour une langue peu dotée. 22ème conférence sur le Traitement Automatique des Langues Naturelles, Jun 2015, Caen, France. ⟨hal-01856098⟩

Share

Metrics

Record views

34

Files downloads

38