Towards an n-grammar of English - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2016

Towards an n-grammar of English

Bert Cappelle

Résumé

In this chapter, it is shown how we can develop a new type of learner’s or student’s grammar based on n-grams (sequences of 2 or 3, 4, etc. items) automatically extracted from a large corpus, such as the Corpus of Contemporary American English (COCA). The notion of n-gram and its primary role in statistical language modelling is first discussed. The part-of-speech (POS) tagging provided for lexical n-grams in COCA is then demonstrated to be useful for the identification of frequent structural strings in the corpus. We propose using the hundred most frequent POS-based 5-grams as the content around which an ‘n-grammar’ of English can be constructed. We counter some obvious objections to this approach (e.g. that these patterns only scratch the surface, or that they display much overlap among them) and describe extra features for this grammar, relating to the patterns’ productivity, corpus dispersion, functional description and practice potential.

Domaines

Linguistique
Fichier principal
Vignette du fichier
De Knop_10_Cappelle.pdf (6.73 Mo) Télécharger le fichier
Origine : Accord explicite pour ce dépôt

Dates et versions

hal-01426700 , version 1 (04-01-2017)

Identifiants

  • HAL Id : hal-01426700 , version 1

Citer

Bert Cappelle, Natalia Grabar. Towards an n-grammar of English. Constructionist Approaches to Second Language Acquisition and Foreign Language Teaching, 2016. ⟨hal-01426700⟩
105 Consultations
288 Téléchargements

Partager

Gmail Facebook X LinkedIn More