Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Data and Knowledge Engineering Année : 2022

Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions

Résumé

This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset is composed of 17 volumes of text from the Encyclopédie by Diderot and d'Alembert (1751-72) including about 70,000 articles. We combine text vectorization (bag-of-words and word embeddings) with machine learning methods, deep learning, and transformer architectures. In addition evaluating these approaches, we review the classification predictions using a variety of quantitative and qualitative methods. The best model obtains 86% as an average f-score for 38 classes. Using network analysis we highlight the difficulty of classifying semantically close classes. We also introduce examples of opportunities for qualitative evaluation of "misclassifications" in order to understand the relationship between content and different ways of ordering knowledge. We openly release all code and results obtained during this research.
Fichier principal
Vignette du fichier
Classifying_encyclopedia_DKE_preprint_submitted.pdf (12.01 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03821073 , version 1 (19-10-2022)

Identifiants

Citer

Alice Brenon, Ludovic Moncla, Katherine Mcdonough. Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data and Knowledge Engineering, 2022, 142, pp.102098. ⟨10.1016/j.datak.2022.102098⟩. ⟨hal-03821073⟩
85 Consultations
72 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More