Reassessing the value of resources for cross-lingual transfer of POS tagging models - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Language Resources and Evaluation Année : 2016

Reassessing the value of resources for cross-lingual transfer of POS tagging models

Résumé

When linguistically annotated data is scarce, as is the case for many under-resourced languages, one has to resort to less complete forms of annotations obtained from crawled dictionaries and/or through cross-lingual transfer. Several recent works have shown that learning from such partially supervised data can be effective in many practical situations. In this work, we review two existing proposals for learning with ambiguous labels which extend conventional learners to the weakly supervised setting: a history-based model using a variant of the perceptron, on the one hand; an extension of the Conditional Random Fields model on the other hand. Focusing on the part-of-speech tagging task, but considering a large set of ten languages, we show (a) that good performance can be achieved even in the presence of ambiguity, provided however that both monolingual and bilingual resources are available; (b) that our two learners exploit different characteristics of the training set, and are successful in different situations; (c) that in addition to the choice of an adequate learning algorithm, many other factors are critical for achieving good performance in a cross-lingual transfer setting.
Fichier non déposé

Dates et versions

hal-01620904 , version 1 (21-10-2017)

Identifiants

Citer

Nicolas Pécheux, Guillaume Wisniewski, François Yvon. Reassessing the value of resources for cross-lingual transfer of POS tagging models. Language Resources and Evaluation, 2016, 50, pp.1-34. ⟨10.1007/s10579-016-9362-7⟩. ⟨hal-01620904⟩
55 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More