Categorizing students' questions using an ensemble hybrid approach - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Categorizing students' questions using an ensemble hybrid approach

Résumé

Students' questions categorization is a challenging task as the available corpora are often limited in size (particularly with languages other than English) and require a costly preliminary manual annotation to train the classifiers. Ensemble learning can help improve machine learning results by combining several models, and is particularly efficient to leverage the strengths of very different classifiers. In this paper, we investigate how combining a rule-based annotator (based on keywords identified by an expert) with various machine learning-based approaches and TF-IDF can improve the automated identification of questions asked by 1st year medicine students on an online platform, according to a coding scheme using 4 dimensions. First we evaluated the performance of several models, calculating the kappa between the prediction and the manually labelled dataset, according to each dimension. Then, using a stacking approach, we tried different combinations of them to design a predictive model with a higher performance. The results reveal that the new ensemble models can help to increase the performance for all dimensions of the dataset, in particular those for which the expert rule-based system showed the lowest performance. These results are promising as they indicate that some easy-to-train models can complement more manual approaches, even with a small training set of a few hundreds of annotated questions.
Fichier principal
Vignette du fichier
EDM_19_short_final.pdf (224.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02157331 , version 1 (26-08-2019)

Identifiants

  • HAL Id : hal-02157331 , version 1

Citer

Fatima Harrak, François Bouchet, Vanda Luengo. Categorizing students' questions using an ensemble hybrid approach. Educational Data Mining, Jul 2019, Montréal, Canada. ⟨hal-02157331⟩
123 Consultations
71 Téléchargements

Partager

Gmail Facebook X LinkedIn More