The Quaero Evaluation Campaign on Term Extraction

Thibault Mondary; Adeline Nazarenko; Haifa Zargayouna; Sabine Barreaux

Communication Dans Un Congrès Année : 2012

The Quaero Evaluation Campaign on Term Extraction

(1) , (1) , (1) , (2)

1
2

Thibault Mondary

Fonction : Auteur
PersonId : 840563

Laboratoire d'Informatique de Paris-Nord

Adeline Nazarenko

Fonction : Auteur
PersonId : 830553

Laboratoire d'Informatique de Paris-Nord

Haifa Zargayouna

Fonction : Auteur
PersonId : 12620
IdHAL : haifa-zargayouna
ORCID : 0000-0002-2482-3074
IdRef : 107985004

Laboratoire d'Informatique de Paris-Nord

Sabine Barreaux

Fonction : Auteur
PersonId : 1072446
IdHAL : sabine-barreaux

Institut de l'information scientifique et technique

Résumé

The Quæro program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of corpus type. The protocol used during this initiative was a comparative analysis of 32 runs against a gold standard. Scores were computed using metrics that take into account gradual relevance. Systems produced by Quæro partners and publicly available systems were evaluated on pharmacology corpora composed of European Patents or abstracts of scientific articles, all in English. The gold standard was an unstructured version of the pharmacology thesaurus used by INIST-CNRS for indexing purposes. Most systems scaled with large corpora, contrasted differences were observed between different versions of the same systems and with better results on scientific articles than on patents. During the ongoing adjudication phase domain experts are enriching the thesaurus with terms found by several systems.

Mots clés

Term extraction Evaluation Quæro Pharmacology Gradual relevance Scalability

Domaines

Intelligence artificielle [cs.AI] Traitement du texte et du document

Fichier principal

lrec_en_2012.pdf (295.39 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Haifa Zargayouna : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00699356

Soumis le : dimanche 20 mai 2012-19:14:21

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : jeudi 15 décembre 2016-08:09:29

Dates et versions

hal-00699356 , version 1 (20-05-2012)

Licence

Paternité

Identifiants

HAL Id : hal-00699356 , version 1

Citer

Thibault Mondary, Adeline Nazarenko, Haifa Zargayouna, Sabine Barreaux. The Quaero Evaluation Campaign on Term Extraction. The 8th international conference on Language Resources and Evaluation (LREC), May 2012, Istanbul, Turkey. pp.663-669. ⟨hal-00699356⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS13 CNRS LIPN GALILE SORBONNE-PARIS-NORD INIST

125 Consultations

145 Téléchargements

The Quaero Evaluation Campaign on Term Extraction

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager