Corpus Linguistics for the Annotation Manager - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Corpus Linguistics for the Annotation Manager

Résumé

Hand crafted annotated corpora are acknowledged as critical elements for the Human Language Technologies but systems have to be trained on domain specific data to achieve a high level of performance. This is the reason why numerous annotation campaigns are launched. The role of the annotation manager consists in designing the annotation protocol, sometimes selecting the source data, hiring the required number of annotators with the adequate competences, writing the annotation guidelines, controlling the annotation process and delivering the resulting annotated corpus with the expected quality. However, for a given task, the complexity of the annotation work seems to be highly dependent on the type of corpus to annotate. Since this affects both the cost and the quality of the annotation, it is an important issue to tackle for the annotation manager. This paper illustrates the role of corpus linguistics for the management of annotations through a specific annotation campaign. We show how the corpus characteristics affect all aspects of the annotation protocol: the design of the annotation guidelines, the selection of the a sub-corpus for training, the duration of the annotator's training, the complexity of the annotation formalism, the quality of the resulting annotation.
Fichier principal
Vignette du fichier
CorpusLinguisticsForAnnotationManager_Final.pdf (185.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00641571 , version 1 (16-11-2011)

Licence

Paternité

Identifiants

  • HAL Id : hal-00641571 , version 1

Citer

Karen Fort, Adeline Nazarenko, Claire Ris. Corpus Linguistics for the Annotation Manager. Corpus Linguistics 2011, Jul 2011, Birmingham, United Kingdom. ⟨hal-00641571⟩
240 Consultations
187 Téléchargements

Partager

Gmail Facebook X LinkedIn More