Influence of Pre-annotation on POS-tagged Corpus Development

Abstract : This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before, while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.
Type de document :
Communication dans un congrès
The Fourth ACL Linguistic Annotation Workshop, Jul 2010, Uppsala, Sweden. pp.56--63, 2010
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00484294
Contributeur : Karën Fort <>
Soumis le : mardi 18 mai 2010 - 11:40:01
Dernière modification le : vendredi 4 janvier 2019 - 17:33:24
Document(s) archivé(s) le : vendredi 19 octobre 2012 - 14:51:48

Fichier

lawiv_KFBS_preannot.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00484294, version 1

Collections

Citation

Karën Fort, Benoît Sagot. Influence of Pre-annotation on POS-tagged Corpus Development. The Fourth ACL Linguistic Annotation Workshop, Jul 2010, Uppsala, Sweden. pp.56--63, 2010. 〈hal-00484294〉

Partager

Métriques

Consultations de la notice

552

Téléchargements de fichiers

212