Mise au point d'une méthode d'annotation morphosyntaxique fine du serbe

Abstract : Developping a method for detailed morphosyntactic tagging of Serbian This paper presents an experience in detailed morphosyntactic tagging of the Serbian subcorpus of the parallel Serbian-French-English ParCoLab corpus. We enriched an existing POS annotation with finer-grained morphosyntactic properties in order to prepare the corpus for subsequent parsing stages. We compared three approaches: 1) manual annotation; 2) pre-annotation with a tagger trained on Croatian, followed by manual correction; 3) retraining the model on a small validated sample of the corpus (20K tokens), followed by automatic annotation and manual correction. The Croatian model maintains its global stability when applied to Serbian texts, but due to the differences between the two tagsets, important manual interventions were still required. A new model was trained on a validated sample of the corpus: it has the same accuracy as the existing model, but the observed acceleration of the manual correction confirms that it is better suited to the task than the first one. MOTS-CLES : Annotation morphosyntaxique, corpus d'entraînement, serbe.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01377060
Contributor : Cécile Fabre <>
Submitted on : Thursday, October 6, 2016 - 6:30:47 PM
Last modification on : Saturday, April 20, 2019 - 1:59:25 AM
Long-term archiving on : Saturday, January 7, 2017 - 12:40:32 PM

File

Miletic_et_al_TALN2016.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01377060, version 1

Collections

Citation

Aleksandra Miletic, Cécile Fabre, Dejan Stosic. Mise au point d'une méthode d'annotation morphosyntaxique fine du serbe. Conférence conjointe JEP-TALN-RECITAL 2016, ATALA, Jul 2016, Paris, France. pp.506-513. ⟨hal-01377060⟩

Share

Metrics

Record views

408

Files downloads

105