An automatic keyphrase extraction system for scientific documents - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Knowledge and Information Systems (KAIS) Année : 2013

An automatic keyphrase extraction system for scientific documents

Résumé

Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, or searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous works, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75\% without increasing the computational complexity ; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency related feature is introduced for selecting the proper granularity. Besides, some other new features are also added for calculating features. Several experiments were carried out to evaluate the described system. Experimental results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new featuresimprove the accuracy of the system. The overall performance of our system compares favorably with other known keyphrase extraction 201systems.

Dates et versions

hal-00849839 , version 1 (01-08-2013)

Identifiants

Citer

Wei You, Dominique Fontaine, Jean-Paul Barthès. An automatic keyphrase extraction system for scientific documents. Knowledge and Information Systems (KAIS), 2013, 34 (3), pp.691-724. ⟨10.1007/s10115-012-0480-2⟩. ⟨hal-00849839⟩
123 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More