Convolutional Neural Networks in Predicting Missing Text in Arabic - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International journal of advanced computer science and applications (IJACSA) Année : 2019

Convolutional Neural Networks in Predicting Missing Text in Arabic

Résumé

Missing text prediction is one of the major concerns of Natural Language Processing deep learning community's attention. However, the majority of text prediction related research is performed in other languages but not Arabic. In this paper, we take a first step in training a deep learning language model on Arabic language. Our contribution is the prediction of missing text from text documents while applying Convolutional Neural Networks (CNN) on Arabic Language Models. We have built CNN-based Language Models responding to specific settings in relation with Arabic language. We have prepared our dataset of a large quantity of text documents freely downloaded from Arab World Books, Hindawi foundation, and Shamela datasets. To calculate the accuracy of prediction, we have compared documents with complete text and same documents with missing text. We realized training, validation and test steps at three different stages aiming to increase the performance of prediction. The model had been trained at first stage on documents of the same author, then at the second stage, it had been trained on documents of the same dataset, and finally, at the third stage, the model had been trained on all document confused. Steps of training, validation and test have been repeated many times by changing each time the author, dataset, and the combination author-dataset, respectively. Also we have used the technique of enlarging training data by feeding the CNN-model each time by a larger quantity of text. The model gave a high performance of Arabic text prediction using Convolutional Neural Networks with an accuracy that have reached 97.8% in best case.
Fichier principal
Vignette du fichier
Paper_68-Convolutional_Neural_Networks_in_Predicting_Missing_Text.pdf (201.17 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02174614 , version 1 (05-07-2019)

Licence

Paternité

Identifiants

Citer

Adnan Souri, Abdelali Zbakh, Mohamed Alachhab, Badr Eddine Elmohajir. Convolutional Neural Networks in Predicting Missing Text in Arabic. International journal of advanced computer science and applications (IJACSA), 2019, 10 (6), ⟨10.14569/IJACSA.2019.0100668⟩. ⟨hal-02174614⟩
82 Consultations
256 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More