Skip to Main content Skip to Navigation
Conference papers

Categorizing students' questions using an ensemble hybrid approach

Abstract : Students' questions categorization is a challenging task as the available corpora are often limited in size (particularly with languages other than English) and require a costly preliminary manual annotation to train the classifiers. Ensemble learning can help improve machine learning results by combining several models, and is particularly efficient to leverage the strengths of very different classifiers. In this paper, we investigate how combining a rule-based annotator (based on keywords identified by an expert) with various machine learning-based approaches and TF-IDF can improve the automated identification of questions asked by 1st year medicine students on an online platform, according to a coding scheme using 4 dimensions. First we evaluated the performance of several models, calculating the kappa between the prediction and the manually labelled dataset, according to each dimension. Then, using a stacking approach, we tried different combinations of them to design a predictive model with a higher performance. The results reveal that the new ensemble models can help to increase the performance for all dimensions of the dataset, in particular those for which the expert rule-based system showed the lowest performance. These results are promising as they indicate that some easy-to-train models can complement more manual approaches, even with a small training set of a few hundreds of annotated questions.
Document type :
Conference papers
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download
Contributor : Vanda Luengo Connect in order to contact the contributor
Submitted on : Monday, August 26, 2019 - 2:19:58 PM
Last modification on : Friday, December 3, 2021 - 11:42:45 AM
Long-term archiving on: : Friday, January 10, 2020 - 4:48:49 AM


Files produced by the author(s)


  • HAL Id : hal-02157331, version 1


Fatima Harrak, François Bouchet, Vanda Luengo. Categorizing students' questions using an ensemble hybrid approach. Educational Data Mining, Jul 2019, Montréal, Canada. ⟨hal-02157331⟩



Record views


Files downloads