Skip to Main content Skip to Navigation
Journal articles

Exploiting languages proximity for part-of-speech tagging of three French regional languages

Abstract : This paper presents experiments in part-of-speech tagging of low-resource languages. It addresses the case when no labeled data in the targeted language and no parallel corpus are available. We only rely on the proximity of the targeted language to a better-resourced language. We conduct experiments on three French regional languages. We try to exploit this proximity with two main strategies: delexicalization and transposition. The general idea is to learn a model on the (better-resourced) source language, which will then be applied to the (regional) target language. Delexicalization is used to deal with the difference in vocabulary, by creating abstract representations of the data. Transposition consists in modifying the target corpus to be able to use the source models. We compare several methods and propose different strategies to combine them and improve the state-of-the-art of part-of-speech tagging in this difficult scenario.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02358020
Contributor : Limsi Publications <>
Submitted on : Monday, November 11, 2019 - 10:47:21 AM
Last modification on : Thursday, February 27, 2020 - 8:42:01 PM

Identifiers

  • HAL Id : hal-02358020, version 1

Citation

Pierre Magistry, Anne-Laure Ligozat, Sophie Rosset. Exploiting languages proximity for part-of-speech tagging of three French regional languages. Language Resources and Evaluation, Springer Verlag, 2019, pp.1-26. ⟨hal-02358020⟩

Share

Metrics

Record views

30