Building a treebank for Occitan: what use for Romance UD corpora? - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Building a treebank for Occitan: what use for Romance UD corpora?

Résumé

This paper describes the application of delexicalized cross-lingual parsing on Occitan with a view to building the first dependency treebank of this language. Occitan is a Romance language spoken in the south of France and in parts of Italy and Spain. It is a relatively low-resourced language and does not have a syntactically annotated corpus as of yet. In order to facilitate the manual annotation process, we train parsing models on the existing Romance corpora from the Universal Dependencies project and apply them to Occitan. Special attention is given to the effect of this cross-lingual annotation on the work of human annotators in terms of annotation speed and ease.

Domaines

Linguistique
Fichier principal
Vignette du fichier
SyntaxFest-2019_paper_11.pdf (81.3 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02380554 , version 1 (26-11-2019)

Identifiants

  • HAL Id : hal-02380554 , version 1

Citer

Aleksandra Miletic, Myriam Bras, Louise Esher, Jean Sibille, Marianne Vergez-Couret. Building a treebank for Occitan: what use for Romance UD corpora?. Syntax Fest, Aug 2019, Paris, France. ⟨hal-02380554⟩
129 Consultations
139 Téléchargements

Partager

Gmail Facebook X LinkedIn More