This and that in native and learner English: From typology of use to tagset characterisation

Abstract : Learner corpus research is now faced with a multiplicity of tagsets. It is therefore difficult to carry out cross-corpus analysis due to the variety of tags used for each part-of-speech (POS). In this paper, we envisage this issue through a specific linguistic point. We propose a typology of uses in both native and non-native corpora. Various tagsets are analysed so as to measure the relevance of the linguistic information provided for this and that. Overall, a comparative analysis of this and that in tagsets is proposed and the benefits and flaws of manual fine-grained annotation versus automatic annotation are assessed. This study comes as a first step towards automated annotation of this and that in various corpora as this process would pave the way to corpus interoperability at POS level.
Document type :
Book sections
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download
Contributor : Thomas Gaillat <>
Submitted on : Friday, May 19, 2017 - 11:22:16 AM
Last modification on : Monday, January 20, 2020 - 3:24:05 PM
Long-term archiving on: Monday, August 21, 2017 - 12:59:54 AM


Files produced by the author(s)


  • HAL Id : hal-01171279, version 1



Thomas Gaillat. This and that in native and learner English: From typology of use to tagset characterisation. Granger Sylviane; Gilquin Gaëtanelle; Meunier Fanny. Twenty years of learner research: looking back, moving ahead Proceedings of the First Learner Corpus Research Conference (LCR 2011), Presses Universitaires de Louvain, 2013. ⟨hal-01171279⟩



Record views


Files downloads