De la constitution d'un corpus arboré à l'analyse syntaxique du serbe

Abstract : In this paper we describe our work on a treebank for Serbian, which aims to provide this language with tools and resources needed for parsing and, more globally, to encourage research on this language both in NLP (natural language processing) and in theoretical linguistics. Beyond the results of this resource-building project, we also provide a description of a treebank-building method that optimizes the limited resources available for an under-resourced language, both from the technical point of view (tools and corpora) and from that of human resources (annotation process). We show how best to take advantage of what is available in order to facilitate the manual work and accelerate the corpus enrichment process, all the while maintaining a high-quality annotation. Being based on language-independent principles, this method should help forward the creation of treebanks for other under-resourced languages.
Document type :
Journal articles
Complete list of metadatas

Cited literature [66 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02007248
Contributor : Aleksandra Miletic <>
Submitted on : Tuesday, February 5, 2019 - 10:42:35 AM
Last modification on : Wednesday, July 10, 2019 - 1:33:03 AM
Long-term archiving on : Monday, May 6, 2019 - 2:21:49 PM

File

Miletic_Fabre_Stosic-Du_corpus...
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-02007248, version 1

Collections

Citation

Aleksandra Miletic, Cécile Fabre, Dejan Stosic. De la constitution d'un corpus arboré à l'analyse syntaxique du serbe. Traitement Automatique des Langues, ATALA, 2019. ⟨hal-02007248⟩

Share

Metrics

Record views

48

Files downloads

81