Construction du jeu d'étiquettes pour le parsing du serbe

Abstract : This paper presents the process of the construction of a syntactic tagset for Serbian. This tagset is intended for the constitution of a training corpus for the parsing of Serbian, in the global aim of linguistic annotation of the ParCoLab corpus, a parallel corpus of Serbian, French and English. Since there are still no treebanks for Serbian, a manually annotated training corpus must be created. As the parsing results can be affected by the structure and size of the tagset, its definition is a crucial stage. In the tag selection process, we were guided by two main goals: reconcile the Serbian and the French grammar tradition for technical and linguistic reasons and maintain the comparability with existing tagsets for other Slavic languages. This strategy led us to 28 tags that ensure the coherence of annotation between different subcorpora and allow for the exploitation of tools developped for other languages in the manual annotation process.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01476701
Contributor : Aleksandra Miletic <>
Submitted on : Saturday, February 25, 2017 - 2:09:12 PM
Last modification on : Saturday, April 20, 2019 - 1:58:48 AM
Long-term archiving on : Friday, May 26, 2017 - 12:17:23 PM

File

tasla-2015-long-001.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01476701, version 1

Collections

Citation

Aleksandra Miletic, Cécile Fabre, Dejan Stosic. Construction du jeu d'étiquettes pour le parsing du serbe. 22e journées du Traitement Automatique des Langues Naturelles, Jun 2015, Caen, France. ⟨hal-01476701⟩

Share

Metrics

Record views

235

Files downloads

111