Skip to Main content Skip to Navigation
Journal articles

De la constitution d'un corpus arboré à l'analyse syntaxique du serbe

Abstract : In this paper we describe our work on a treebank for Serbian, which aims to provide this language with tools and resources needed for parsing and, more globally, to encourage research on this language both in NLP (natural language processing) and in theoretical linguistics. Beyond the results of this resource-building project, we also provide a description of a treebank-building method that optimizes the limited resources available for an under-resourced language, both from the technical point of view (tools and corpora) and from that of human resources (annotation process). We show how best to take advantage of what is available in order to facilitate the manual work and accelerate the corpus enrichment process, all the while maintaining a high-quality annotation. Being based on language-independent principles, this method should help forward the creation of treebanks for other under-resourced languages.
Document type :
Journal articles
Complete list of metadata

Cited literature [66 references]  Display  Hide  Download
Contributor : Aleksandra Miletic Connect in order to contact the contributor
Submitted on : Tuesday, February 5, 2019 - 10:42:35 AM
Last modification on : Wednesday, November 17, 2021 - 12:31:05 PM
Long-term archiving on: : Monday, May 6, 2019 - 2:21:49 PM


Publisher files allowed on an open archive


  • HAL Id : hal-02007248, version 1


Aleksandra Miletic, Cécile Fabre, Dejan Stosic. De la constitution d'un corpus arboré à l'analyse syntaxique du serbe. Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2019. ⟨hal-02007248⟩



Record views


Files downloads