Skip to Main content Skip to Navigation
Book sections

Towards an n-grammar of English

Abstract : In this chapter, it is shown how we can develop a new type of learner’s or student’s grammar based on n-grams (sequences of 2 or 3, 4, etc. items) automatically extracted from a large corpus, such as the Corpus of Contemporary American English (COCA). The notion of n-gram and its primary role in statistical language modelling is first discussed. The part-of-speech (POS) tagging provided for lexical n-grams in COCA is then demonstrated to be useful for the identification of frequent structural strings in the corpus. We propose using the hundred most frequent POS-based 5-grams as the content around which an ‘n-grammar’ of English can be constructed. We counter some obvious objections to this approach (e.g. that these patterns only scratch the surface, or that they display much overlap among them) and describe extra features for this grammar, relating to the patterns’ productivity, corpus dispersion, functional description and practice potential.
Document type :
Book sections
Complete list of metadatas
Contributor : Natalia Grabar <>
Submitted on : Wednesday, January 4, 2017 - 6:16:23 PM
Last modification on : Tuesday, July 3, 2018 - 11:47:54 AM
Long-term archiving on: : Wednesday, April 5, 2017 - 3:13:29 PM


De Knop_10_Cappelle.pdf
Explicit agreement for this submission


  • HAL Id : hal-01426700, version 1



Bert Cappelle, Natalia Grabar. Towards an n-grammar of English. Constructionist Approaches to Second Language Acquisition and Foreign Language Teaching, 2016. ⟨hal-01426700⟩



Record views


Files downloads