Corpus Linguistics for the Annotation Manager

Abstract : Hand crafted annotated corpora are acknowledged as critical elements for the Human Language Technologies but systems have to be trained on domain specific data to achieve a high level of performance. This is the reason why numerous annotation campaigns are launched. The role of the annotation manager consists in designing the annotation protocol, sometimes selecting the source data, hiring the required number of annotators with the adequate competences, writing the annotation guidelines, controlling the annotation process and delivering the resulting annotated corpus with the expected quality. However, for a given task, the complexity of the annotation work seems to be highly dependent on the type of corpus to annotate. Since this affects both the cost and the quality of the annotation, it is an important issue to tackle for the annotation manager. This paper illustrates the role of corpus linguistics for the management of annotations through a specific annotation campaign. We show how the corpus characteristics affect all aspects of the annotation protocol: the design of the annotation guidelines, the selection of the a sub-corpus for training, the duration of the annotator's training, the complexity of the annotation formalism, the quality of the resulting annotation.
Type de document :
Communication dans un congrès
Corpus Linguistics 2011, Jul 2011, Birmingham, United Kingdom. 2011
Liste complète des métadonnées

Littérature citée [9 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00641571
Contributeur : Karën Fort <>
Soumis le : mercredi 16 novembre 2011 - 11:05:47
Dernière modification le : jeudi 11 janvier 2018 - 06:26:27
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 11:01:27

Fichier

CorpusLinguisticsForAnnotation...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00641571, version 1

Collections

Citation

Karën Fort, Adeline Nazarenko, Ris Claire. Corpus Linguistics for the Annotation Manager. Corpus Linguistics 2011, Jul 2011, Birmingham, United Kingdom. 2011. 〈hal-00641571〉

Partager

Métriques

Consultations de la notice

325

Téléchargements de fichiers

109