Skip to Main content Skip to Navigation

Investigating the scope of textual metrics for learner level discrimination and learner analytics

Abstract : This paper focuses on textual metrics that can be used in ICALL systems as criterial features. Empirical research approaches to learner corpora include the identification of criterial features linked to learners’ proficiency levels. With a view to developing ICALL systems aimed at giving feedback on the level of proficiency, it is necessary to identify which metrics are significant to discriminate learners at a given stage (Crossley et al. 2011; Hawkins and Filipović 2012; Arnold et al. 2018; Pilán and Volodina 2018). However, the metrics need to be self-intuitive for learners in their meta-cognitive learning processes. Our research question is to investigate the significance of a scope-oriented taxonomy of metrics. For that purpose, we propose a fine-grained taxonomy based on the scope of the metrics to support feedback related to word, sentence or text levels. The formulae of metrics rely on different types of variables such as syllables, words, clauses and sentences. Our purpose is to match metrics with scopes and to investigate how these scopes correlate with different proficiency levels. We follow a supervised learning approach in which we test metrics of different scopes in relation to the scores obtained by students at the DIALANG test (Alderson and Huhta 2005), as a proxy to the CEFR. We put to the test the typology with the classification of 282 texts written by French learners of English. The data processing pipeline relies on {quanteda} R (Benoit et al. 2018) and Lu’s L2SCA (Lu 2010) to compute a range of metrics per text. We apply the randomForest modeling method in order to classify texts according to levels. When classifying texts across the six classes on the test set, results are mitigated with a mean accuracy of 55.35%. When classifying according to three aggregated A, B and C levels, accuracy is 75% with most confusion between A and B levels. We conduct model explanation by extracting important variables with the Gini Index measure. Results show that Root & Corrected & log TTR, Complex Nominals (CN), Dependent clauses/clauses, Number of Words, sentences, Yule’s K metrics have the highest level of importance in the crucial B level, i.e. that of the independent user. These metrics relate to scopes with fine grained attributes such as size in texts, type repetitions in texts, word variations in texts or specific constituents in sentences. With such scopes it is possible to provide more meaningful feedback for learners.
Complete list of metadatas

Cited literature [7 references]  Display  Hide  Download
Contributor : Thomas Gaillat <>
Submitted on : Tuesday, March 3, 2020 - 10:08:52 AM
Last modification on : Friday, March 27, 2020 - 3:21:37 AM


  • HAL Id : hal-02496571, version 1


Nicolas Ballier, Thomas Gaillat. Investigating the scope of textual metrics for learner level discrimination and learner analytics. Learner Corpus Research Conference, University of Warsaw, Poland, Sep 2019, Varsaw, Poland. ⟨hal-02496571⟩



Record views


Files downloads