Large-Scale Evaluation of Keyphrase Extraction Models

Keyphrase extraction models are usually evaluated under different, not directly comparable, experimental setups. As a result, it remains unclear how well proposed models actually perform, and how they compare to each other. In this work, we address this issue by presenting a systematic large-scale analysis of state-of-the-art keyphrase extraction models involving multiple benchmark datasets from various sources and domains. Our main results reveal that state-of-the-art models are in fact still challenged by simple baselines on some datasets. We also present new insights about the impact of using author- or reader-assigned keyphrases as a proxy for gold standard, and give recommendations for strong baselines and reliable benchmark datasets.

Mots clés

Digital libraries Information retrieval 2 Keyphrase generation natural language processing

Domaines

Recherche d'information [cs.IR] Traitement du texte et du document Bibliothèque électronique [cs.DL]

Fichier principal

large_scale_exp(1).pdf (1.35 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Florian Boudin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02878953

Soumis le : mardi 23 juin 2020-13:36:20

Dernière modification le : vendredi 24 mars 2023-14:53:18

Archivage à long terme le : jeudi 24 septembre 2020-17:06:01

Dates et versions

hal-02878953 , version 1 (23-06-2020)

Identifiants

HAL Id : hal-02878953 , version 1
DOI : 10.1145/1122445.1122456

Citer

Ygor Gallina, Florian Boudin, Béatrice Daille. Large-Scale Evaluation of Keyphrase Extraction Models. ACM/IEEE Joint Conference on Digital Libraries (JCDL), Aug 2020, Wuhan, China. ⟨10.1145/1122445.1122456⟩. ⟨hal-02878953⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES INSTITUT-TELECOM CNRS EC-NANTES UNAM LS2N LS2N-TALN ANR NANTES-UNIVERSITE

92 Consultations

592 Téléchargements