Arabic Text Classification Based on Word and Document Embeddings

Abstract : Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
Liste complète des métadonnées
Contributor : Abdelkader El Mahdaouy <>
Submitted on : Saturday, October 22, 2016 - 8:30:41 PM
Last modification on : Thursday, October 11, 2018 - 8:48:05 AM




Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Arabic Text Classification Based on Word and Document Embeddings. Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. 〈10.1007/978-3-319-48308-5_4〉. 〈hal-01386136〉



Record views