| HAL : hal-00484294, version 1 |
| Fiche détaillée | Récupérer au format |
|
|
| The Fourth ACL Linguistic Annotation Workshop, Uppsala : Suède (2010) |
|
|
|
|
| Influence of Pre-annotation on POS-tagged Corpus Development |
|
|
| Karën Fort 1, 2Benoît Sagot 3 |
|
|
| (2010) |
|
|
| This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before, while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed. |
|
|
|
|
|
|
|
|
|
|
| 1 : | Institut de l'information scientifique et technique (INIST) |
| CNRS : UPS76 | |
| 2 : | Laboratoire d'informatique de Paris-nord (LIPN) |
| CNRS : UMR7030 – Université Paris XIII - Paris Nord | |
| 3 : | ALPAGE (INRIA Rocquencourt) |
| INRIA – Université Paris VII - Paris Diderot | |
|
|
|
|
|
|
|
|
| Domaine | : | Informatique/Traitement du texte et du document |
|
|
| Liste des fichiers attachés à ce document : | |||||
|
|
|
| hal-00484294, version 1 | |
| http://hal.archives-ouvertes.fr/hal-00484294 | |
| oai:hal.archives-ouvertes.fr:hal-00484294 | |
| Contributeur : Karën Fort | |
| Soumis le : Mardi 18 Mai 2010, 11:40:01 | |
| Dernière modification le : Mardi 3 Juillet 2012, 17:07:21 | |