Do we still Need Gold Standards for Evaluation?

Thierry Poibeau; Cédric Messiant

Communication Dans Un Congrès Année : 2008

Do we still Need Gold Standards for Evaluation?

(1) , (1)

Thierry Poibeau

Fonction : Auteur
PersonId : 472
IdHAL : thierry-poibeau
ORCID : 0000-0003-3669-4051
IdRef : 069992258

Laboratoire d'Informatique de Paris-Nord

Cédric Messiant

Fonction : Auteur

Laboratoire d'Informatique de Paris-Nord

Résumé

The availability of a huge mass of textual data in electronic format has increased the need for fast and accurate techniques for textual data processing. Machine learning and statistical approaches have been increasingly used in NLP since the 1990s, mainly because they are quick, versatile and efficient. However, despite this evolution of the field, evaluation still rely (most of the time) on a comparison between the output of a probabilistic or statistical system on the one hand, and a non-statistic, most of the time hand-crafted, gold standard on the other hand. In order to be able to compare these two sets of data, which are inherently of a different nature, it is first necessary to modify the statistical data so that they fit with the hand-crafted reference. For example, a statistical parser, instead of producing a score of grammaticality, will have to produce a binary value for each sentence (grammatical vs ungrammatical) or a tree similar to the one stored in the treebank used as a reference. In this paper, we take the example of the acquisition of subcategorization frames from corpora as a practical example. Our study is motivated by the fact that, even if a gold standard is an invaluable resource for evaluation, a gold standard is always partial and does not really show how accurate and useful results are. We describe the task (SCF acquisition) and show how it is a typical NLP task. We then very briefly describe our SCF acquisition system before discussing different issues related to the evaluation using a gold standard. Lastly, we adopt the classical distinction between intrinsic and extrinsic evaluation and show why this framework is relevant for SCF acquisition. We show that, even if intrinsic evaluation correlates with extrinsic evaluation, these two evaluation frameworks give a complementary insight on the results. In the conclusion, we quickly discuss the case of other NLP tasks.

Domaines

Intelligence artificielle [cs.AI] Linguistique Informatique Linguistique

Fichier principal

lrec08_eval.pdf (106.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thierry Poibeau : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00321436

Soumis le : dimanche 14 septembre 2008-20:00:48

Dernière modification le : vendredi 24 mars 2023-14:52:50

Archivage à long terme le : vendredi 4 juin 2010-11:20:27

Dates et versions

hal-00321436 , version 1 (14-09-2008)

Identifiants

HAL Id : hal-00321436 , version 1

Citer

Thierry Poibeau, Cédric Messiant. Do we still Need Gold Standards for Evaluation?. Language Resource and Evaluation Conference, 2008, Morocco. ⟨hal-00321436⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS13 CNRS LIPN GALILE SORBONNE-PARIS-NORD

148 Consultations

120 Téléchargements

Do we still Need Gold Standards for Evaluation?

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager