Towards an n-grammar of English

Abstract : In this chapter, it is shown how we can develop a new type of learner’s or student’s grammar based on n-grams (sequences of 2 or 3, 4, etc. items) automatically extracted from a large corpus, such as the Corpus of Contemporary American English (COCA). The notion of n-gram and its primary role in statistical language modelling is first discussed. The part-of-speech (POS) tagging provided for lexical n-grams in COCA is then demonstrated to be useful for the identification of frequent structural strings in the corpus. We propose using the hundred most frequent POS-based 5-grams as the content around which an ‘n-grammar’ of English can be constructed. We counter some obvious objections to this approach (e.g. that these patterns only scratch the surface, or that they display much overlap among them) and describe extra features for this grammar, relating to the patterns’ productivity, corpus dispersion, functional description and practice potential.
Document type :
Book sections
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01426700
Contributor : Natalia Grabar <>
Submitted on : Wednesday, January 4, 2017 - 6:16:23 PM
Last modification on : Tuesday, July 3, 2018 - 11:47:54 AM
Long-term archiving on : Wednesday, April 5, 2017 - 3:13:29 PM

File

De Knop_10_Cappelle.pdf
Explicit agreement for this submission

Identifiers

  • HAL Id : hal-01426700, version 1

Collections

Citation

Bert Cappelle, Natalia Grabar. Towards an n-grammar of English. Constructionist Approaches to Second Language Acquisition and Foreign Language Teaching, 2016. ⟨hal-01426700⟩

Share

Metrics

Record views

128

Files downloads

404