Skip to Main content Skip to Navigation
Conference papers

Mise au point d'une méthode d'annotation morphosyntaxique fine du serbe

Abstract : Developping a method for detailed morphosyntactic tagging of Serbian This paper presents an experience in detailed morphosyntactic tagging of the Serbian subcorpus of the parallel Serbian-French-English ParCoLab corpus. We enriched an existing POS annotation with finer-grained morphosyntactic properties in order to prepare the corpus for subsequent parsing stages. We compared three approaches: 1) manual annotation; 2) pre-annotation with a tagger trained on Croatian, followed by manual correction; 3) retraining the model on a small validated sample of the corpus (20K tokens), followed by automatic annotation and manual correction. The Croatian model maintains its global stability when applied to Serbian texts, but due to the differences between the two tagsets, important manual interventions were still required. A new model was trained on a validated sample of the corpus: it has the same accuracy as the existing model, but the observed acceleration of the manual correction confirms that it is better suited to the task than the first one. MOTS-CLES : Annotation morphosyntaxique, corpus d'entraînement, serbe.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download
Contributor : Cécile Fabre <>
Submitted on : Thursday, October 6, 2016 - 6:30:47 PM
Last modification on : Friday, September 18, 2020 - 2:34:36 PM
Long-term archiving on: : Saturday, January 7, 2017 - 12:40:32 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01377060, version 1


Aleksandra Miletic, Cécile Fabre, Dejan Stosic. Mise au point d'une méthode d'annotation morphosyntaxique fine du serbe. Conférence conjointe JEP-TALN-RECITAL 2016, ATALA, Jul 2016, Paris, France. pp.506-513. ⟨hal-01377060⟩



Record views


Files downloads