Investigating the scope of textual metrics for learner level discrimination and learner analytics

Nicolas Ballier; Thomas Gaillat

Communication Dans Un Congrès Année : 2019

Investigating the scope of textual metrics for learner level discrimination and learner analytics

(1) , (2)

1
2

Nicolas Ballier

Fonction : Auteur
PersonId : 7391
IdHAL : nicolas-ballier
ORCID : 0000-0003-2179-1043
IdRef : 057712409

Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus

Thomas Gaillat

Fonction : Auteur
PersonId : 13850
IdHAL : thomas-gaillat
ORCID : 0000-0003-3433-6533
IdRef : 235272574

Linguistique, Ingénierie, Didactique des Langues

Résumé

This paper focuses on textual metrics that can be used in ICALL systems as criterial features. Empirical research approaches to learner corpora include the identification of criterial features linked to learners’ proficiency levels. With a view to developing ICALL systems aimed at giving feedback on the level of proficiency, it is necessary to identify which metrics are significant to discriminate learners at a given stage (Crossley et al. 2011; Hawkins and Filipović 2012; Arnold et al. 2018; Pilán and Volodina 2018). However, the metrics need to be self-intuitive for learners in their meta-cognitive learning processes. Our research question is to investigate the significance of a scope-oriented taxonomy of metrics. For that purpose, we propose a fine-grained taxonomy based on the scope of the metrics to support feedback related to word, sentence or text levels. The formulae of metrics rely on different types of variables such as syllables, words, clauses and sentences. Our purpose is to match metrics with scopes and to investigate how these scopes correlate with different proficiency levels. We follow a supervised learning approach in which we test metrics of different scopes in relation to the scores obtained by students at the DIALANG test (Alderson and Huhta 2005), as a proxy to the CEFR. We put to the test the typology with the classification of 282 texts written by French learners of English. The data processing pipeline relies on {quanteda} R (Benoit et al. 2018) and Lu’s L2SCA (Lu 2010) to compute a range of metrics per text. We apply the randomForest modeling method in order to classify texts according to levels. When classifying texts across the six classes on the test set, results are mitigated with a mean accuracy of 55.35%. When classifying according to three aggregated A, B and C levels, accuracy is 75% with most confusion between A and B levels. We conduct model explanation by extracting important variables with the Gini Index measure. Results show that Root & Corrected & log TTR, Complex Nominals (CN), Dependent clauses/clauses, Number of Words, sentences, Yule’s K metrics have the highest level of importance in the crucial B level, i.e. that of the independent user. These metrics relate to scopes with fine grained attributes such as size in texts, type repetitions in texts, word variations in texts or specific constituents in sentences. With such scopes it is possible to provide more meaningful feedback for learners.

Domaines

Linguistique Apprentissage [cs.LG] Informatique et langage [cs.CL]

LCR19 presentation.pdf (237.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Gaillat : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02496571

Soumis le : mardi 3 mars 2020-10:08:52

Dernière modification le : lundi 7 mars 2022-14:44:04

Dates et versions

hal-02496571 , version 1 (03-03-2020)

Identifiants

HAL Id : hal-02496571 , version 1

Citer

Nicolas Ballier, Thomas Gaillat. Investigating the scope of textual metrics for learner level discrimination and learner analytics. Learner Corpus Research Conference, University of Warsaw, Poland, Sep 2019, Varsaw, Poland. ⟨hal-02496571⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 UR2-HB USPC UNIV-RENNES2 UNIV-RENNES CLILLAC-ARP LIDILE UP-SOCIETES-HUMANITES

89 Consultations

86 Téléchargements

Investigating the scope of textual metrics for learner level discrimination and learner analytics

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager