Exploiting languages proximity for part-of-speech tagging of three French regional languages - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Language Resources and Evaluation Année : 2019

Exploiting languages proximity for part-of-speech tagging of three French regional languages

Résumé

This paper presents experiments in part-of-speech tagging of low-resource languages. It addresses the case when no labeled data in the targeted language and no parallel corpus are available. We only rely on the proximity of the targeted language to a better-resourced language. We conduct experiments on three French regional languages. We try to exploit this proximity with two main strategies: delexicalization and transposition. The general idea is to learn a model on the (better-resourced) source language, which will then be applied to the (regional) target language. Delexicalization is used to deal with the difference in vocabulary, by creating abstract representations of the data. Transposition consists in modifying the target corpus to be able to use the source models. We compare several methods and propose different strategies to combine them and improve the state-of-the-art of part-of-speech tagging in this difficult scenario.
Fichier non déposé

Dates et versions

hal-02358020 , version 1 (11-11-2019)

Identifiants

  • HAL Id : hal-02358020 , version 1

Citer

Pierre Magistry, Anne-Laure Ligozat, Sophie Rosset. Exploiting languages proximity for part-of-speech tagging of three French regional languages. Language Resources and Evaluation, 2019, pp.1-26. ⟨hal-02358020⟩
88 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More