Étiquetage thématique automatisé de corpus par représentation sémantique

Abstract : In scientific text corpus, some articles from different research communities are not tagged by the same keywords even if they share the same topic. This causes issues in information retrieval systems using limited number of tag variations and thus, lower chances of interdisciplinary exploration. Our approach automatically assigns a topic tag to articles by learning a classifier for each topic based on the semantics representation of the title and the abstract of already tagged articles. The approach requires much less computation power than using topic modeling on millions of documents. In our proposed model, we use topic sysnomyns to retrieve more semantically similar articles and merge them to the articles obtained by the topic classifier. The experiments show higher recall against two variations of the model, one only uses the synonyms set, and another one only uses the semantic representation of the text.
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01659639
Contributor : Fabrice Muhlenbach <>
Submitted on : Friday, December 8, 2017 - 4:08:05 PM
Last modification on : Wednesday, October 31, 2018 - 12:24:20 PM

File

Martinel_et_al__Etiquetage-the...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01659639, version 1

Citation

Lucie Martinet, Hussein Al-Natsheh, Fabien Rico, Fabrice Muhlenbach, Djamel Zighed. Étiquetage thématique automatisé de corpus par représentation sémantique. EGC 2018 - 18ème Conférence Internationale sur l'Extraction et la Gestion de Connaissances, Jan 2018, Paris-Nord, France. pp.1-6. ⟨hal-01659639⟩

Share

Metrics

Record views

376

Files downloads

286