Learning English and Arabic Question Similarity with Siamese Neural Networks in Community Question Answering services - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Data and Knowledge Engineering Année : 2021

Learning English and Arabic Question Similarity with Siamese Neural Networks in Community Question Answering services

Résumé

In this paper, we tackle the task of similar question retrieval (QR) which is essential for Community Question Answering (cQA) and aims to retrieve historical questions that are semantically equivalent to the new queries. Over time, with the sharp increase of community archives and the accumulation of duplicated questions, the QR problem has become increasingly challenging due to the shortness of the community questions as well as the word mismatch problem as users can formulate the same query using different wording. Although many efforts have been devoted to address this problem, existing methods mostly relied on supervised models which significantly depend on massive training data sets and manual feature engineering. Such methods are chiefly constrained by their specificities that ignore the word order and do not capture enough syntactic and semantic information in questions. In this paper, we rely on Neural Networks (NNs) which use a deep analysis of words and questions to take into consideration the semantics as well as the structure of questions to predict the semantic text similarity. We propose a deep learning approach based on a Siamese architecture with Long Short-Term Memory (LSTM) networks, augmented with an attention mechanism to let the model give different words different attention while modeling questions. We also explore the use of Convolutional Neural Networks (CNN) nested within the Siamese architecture to retrieve relevant questions. Different similarity measures were tested to predict the semantic similarity between the the pairs of questions. To evaluate the proposed approach, we conducted experiments on large-scale datasets in English and Arabic.
Fichier principal
Vignette du fichier
DKE_journal_Final_reviewed-depose.pdf (860.92 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03500114 , version 1 (21-12-2021)

Identifiants

Citer

Nouha Othman, Rim Faiz, Kamel Smaïli. Learning English and Arabic Question Similarity with Siamese Neural Networks in Community Question Answering services. Data and Knowledge Engineering, inPress, 101962, ⟨10.1016/j.datak.2021.101962⟩. ⟨hal-03500114⟩
52 Consultations
159 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More