Influence of Pre-annotation on POS-tagged Corpus Development

Abstract : This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before, while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00484294
Contributeur : Karën Fort <>
Soumis le : mardi 18 mai 2010 - 11:40:01
Dernière modification le : samedi 15 février 2020 - 01:49:20
Archivage à long terme le : vendredi 19 octobre 2012 - 14:51:48

Fichier

lawiv_KFBS_preannot.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00484294, version 1

Collections

Citation

Karen Fort, Benoît Sagot. Influence of Pre-annotation on POS-tagged Corpus Development. The Fourth ACL Linguistic Annotation Workshop, Jul 2010, Uppsala, Sweden. pp.56--63. ⟨hal-00484294⟩

Partager

Métriques

Consultations de la notice

604

Téléchargements de fichiers

267