Amélioration de la similarité sémantique vectorielle par méthodes non-supervisées

Abstract : Improved the Semantic Similarity with Weighting Vectors Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity between sentences. The main idea is to exploit the word representations as vectors in a multidimensional space to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments on a dataset of the state of the art on arabic sentences.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [14 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01531886
Contributor : Jérémy Ferrero <>
Submitted on : Friday, June 2, 2017 - 9:39:39 AM
Last modification on : Tuesday, February 12, 2019 - 1:31:23 AM
Document(s) archivé(s) le : Wednesday, December 13, 2017 - 8:15:33 AM

File

TALN_2017_paper_52.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01531886, version 1

Collections

Citation

El Moatez Billah Nagoudi, Jérémy Ferrero, Didier Schwab. Amélioration de la similarité sémantique vectorielle par méthodes non-supervisées. 24e conférence sur le Traitement Automatique des Langues Naturelles (TALN 2017), Jun 2017, Orléans, France. ⟨hal-01531886⟩

Share

Metrics

Record views

329

Files downloads

954