A corpus of real-life questions for evaluating robustness of QA systems

Laurianne Sitbon; Patrice Bellot; Philippe Blache

Communication Dans Un Congrès Année : 2008

A corpus of real-life questions for evaluating robustness of QA systems

(1) , (1) , (2)

1
2

Laurianne Sitbon

Fonction : Auteur

Laboratoire Informatique d'Avignon

Patrice Bellot

Fonction : Auteur
PersonId : 14204
IdHAL : patrice-bellot
ORCID : 0000-0001-8698-5055
IdRef : 079380956

Laboratoire Informatique d'Avignon

Philippe Blache

Fonction : Auteur
PersonId : 12345
IdHAL : philippe-blache
ORCID : 0000-0002-5216-9591
IdRef : 056623771

Laboratoire Parole et Langage

Résumé

Many evaluation campaigns on question answering (QA) systems have been organized for years. The international TREC 1 conference proposes a track about it , the European CLEF 2 campaign proposes cross evaluation in eight languages, NTCIR 3 includes a QA track in three languages and the French Technolangue EQUER 4 evaluation focused on 500 corpus based questions. But the questions asked in those campaigns are checked for being well formed and enough complete. We aim to test QA systems on their ability to answer questions spontaneously typed by people without thinking deeply to the grammatical and lexical forms they might use. This is designed to test QA system robustness in more real life uses. Related experiments already done in document retrieval field test the robustness of search engine with automatic transcription of spoken queries (Crestani, 2000) or with automatically degraded text entries (Ruch, 2002). Moreover, we aim to test our QA system (Gillard et al., 2006) with questions written by dyslexic adults or children, or non-native speakers. This public is intuitively the most concerned with these problems of robustness. 1 Corpus constitution A web-based approach was used to acquire data for the corpus. The motivation for such an approach is two fold. Firstly, it lets users make the experiment in relaxing conditions, when and where they wish to. Secondly, it permits to collect data from a wide population, especially for dyslexics individuals already solicited for psycholinguistics experiments. It removes geographical constraints often restraining the quantity of data in this area. The experiment is composed of 20 questions selected from EQUER French evaluation campaign. The selected questions were some right answered by SQuALIA (Gillard et al., 2006). 8 of them contain proper nouns. 2 of them contain foreign low frequency proper nouns. The covered focuses are: person name (5 questions), number (5 questions), date (3 questions), location (2 questions), money, distance, age, journal name and military grade.

Domaines

Informatique [cs]

bibliothèque Universitaire Déposants HAL-Avignon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01321115

Soumis le : mercredi 25 mai 2016-08:37:03

Dernière modification le : vendredi 24 mars 2023-14:53:02

Dates et versions

hal-01321115 , version 1 (25-05-2016)

Identifiants

HAL Id : hal-01321115 , version 1

Citer

Laurianne Sitbon, Patrice Bellot, Philippe Blache. A corpus of real-life questions for evaluating robustness of QA systems. LREC , May 2008, Marrakech, Morocco. ⟨hal-01321115⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON CNRS UNIV-AMU LPL-AIX LIA

50 Consultations

0 Téléchargements

A corpus of real-life questions for evaluating robustness of QA systems

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager