Measuring text readability with machine comprehension: a pilot study

Marc Benzahra; François Yvon

Communication Dans Un Congrès Année : 2019

Measuring text readability with machine comprehension: a pilot study

(1) , (1)

Marc Benzahra

Fonction : Auteur
PersonId : 1052435

Traitement du Langage Parlé

François Yvon

Fonction : Auteur
PersonId : 5347
IdHAL : francois-yvon
ORCID : 0000-0002-7972-7442
IdRef : 057593531

Traitement du Langage Parlé

Résumé

This article studies the relationship between text readability indice and automatic machine understanding systems. Our hypothesis is that the simpler a text is, the better it should be understood by a machine. We thus expect to a strong correlation between readability levels on the one hand, and performance of automatic reading systems on the other hand. We test this hypothesis with several understanding systems based on language models of varying strengths, measuring this correlation on two corpora of journalistic texts. Our results suggest that this correlation is rather small that existing comprehension systems are far to reproduce the gradual improvement of their performance on texts of decreasing complexity.

Mots clés

Machine comprehension Text readability Neural language models

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Fichier principal

document(5).pdf (363.05 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02267546

Soumis le : lundi 19 août 2019-14:18:36

Dernière modification le : samedi 7 octobre 2023-21:36:21

Archivage à long terme le : jeudi 9 janvier 2020-14:55:29

Dates et versions

hal-02267546 , version 1 (19-08-2019)

Identifiants

HAL Id : hal-02267546 , version 1

Citer

Marc Benzahra, François Yvon. Measuring text readability with machine comprehension: a pilot study. Workshop on Building Educational Applications Using NLP, Aug 2019, Florence, Italy. pp.412 - 422. ⟨hal-02267546⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-TLP

217 Consultations

287 Téléchargements

Measuring text readability with machine comprehension: a pilot study

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager