Semantic Similarity of Arabic Sentences with Word Embeddings

El Moatez Billah Nagoudi; Didier Schwab

Communication Dans Un Congrès Année : 2017

Semantic Similarity of Arabic Sentences with Word Embeddings

(1, 2) , (3, 1, 2)

1
2
3

El Moatez Billah Nagoudi

Fonction : Auteur
PersonId : 1026552

Laboratoire d'Informatique de Grenoble

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Didier Schwab

Fonction : Auteur
PersonId : 4261
IdHAL : didier-schwab
ORCID : 0000-0002-2462-8148
IdRef : 069192359

Université Grenoble Alpes [2016-2019]

Laboratoire d'Informatique de Grenoble

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Résumé

Semantic textual similarity is the basis of countless applications and plays an important role in diverse areas, such as information retrieval, plagiarism detection, information extraction and machine translation. This article proposes an innovative word embedding-based system devoted to calculate the semantic similarity in Arabic sentences. The main idea is to exploit vectors as word representations in a multidi-mensional space in order to capture the semantic and syntactic properties of words. IDF weighting and Part-of-Speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. The performance of our proposed system is confirmed through the Pearson correlation between our assigned semantic similarity scores and human judgments.

Mots clés

Vector Model Space Word Representations Semantic Sentences Similarity Word Embedding

Domaines

Intelligence artificielle [cs.AI] Informatique et langage [cs.CL] Traitement du texte et du document

Fichier principal

W17-1303.pdf (208.55 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Didier Schwab : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01683485

Soumis le : lundi 15 janvier 2018-15:43:07

Dernière modification le : jeudi 4 avril 2024-21:18:39

Archivage à long terme le : lundi 7 mai 2018-11:30:20

Dates et versions

hal-01683485 , version 1 (15-01-2018)

Identifiants

HAL Id : hal-01683485 , version 1

Citer

El Moatez Billah Nagoudi, Didier Schwab. Semantic Similarity of Arabic Sentences with Word Embeddings. Third Arabic Natural Language Processing Workshop, Apr 2017, Valencia, France. pp.18 - 24. ⟨hal-01683485⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP LIG_SIDCH

581 Consultations

866 Téléchargements

Semantic Similarity of Arabic Sentences with Word Embeddings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager