Influence of Pre-annotation on POS-tagged Corpus Development

Abstract : This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before, while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Karën Fort <>
Submitted on : Tuesday, May 18, 2010 - 11:40:01 AM
Last modification on : Thursday, February 7, 2019 - 5:53:08 PM
Long-term archiving on : Friday, October 19, 2012 - 2:51:48 PM


Files produced by the author(s)


  • HAL Id : hal-00484294, version 1


Karën Fort, Benoît Sagot. Influence of Pre-annotation on POS-tagged Corpus Development. The Fourth ACL Linguistic Annotation Workshop, Jul 2010, Uppsala, Sweden. pp.56--63. ⟨hal-00484294⟩



Record views


Files downloads