Categorizing students' questions using an ensemble hybrid approach

Abstract : Students' questions categorization is a challenging task as the available corpora are often limited in size (particularly with languages other than English) and require a costly preliminary manual annotation to train the classifiers. Ensemble learning can help improve machine learning results by combining several models, and is particularly efficient to leverage the strengths of very different classifiers. In this paper, we investigate how combining a rule-based annotator (based on keywords identified by an expert) with various machine learning-based approaches and TF-IDF can improve the automated identification of questions asked by 1st year medicine students on an online platform, according to a coding scheme using 4 dimensions. First we evaluated the performance of several models, calculating the kappa between the prediction and the manually labelled dataset, according to each dimension. Then, using a stacking approach, we tried different combinations of them to design a predictive model with a higher performance. The results reveal that the new ensemble models can help to increase the performance for all dimensions of the dataset, in particular those for which the expert rule-based system showed the lowest performance. These results are promising as they indicate that some easy-to-train models can complement more manual approaches, even with a small training set of a few hundreds of annotated questions.
Document type :
Conference papers
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02157331
Contributor : Vanda Luengo <>
Submitted on : Monday, August 26, 2019 - 2:19:58 PM
Last modification on : Tuesday, September 17, 2019 - 10:37:45 AM

File

EDM_19_short_final.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02157331, version 1

Citation

Fatima Harrak, François Bouchet, Vanda Luengo. Categorizing students' questions using an ensemble hybrid approach. Educational Data Mining, Jul 2019, Montréal, Canada. ⟨hal-02157331⟩

Share

Metrics

Record views

66

Files downloads

9