Skip to Main content Skip to Navigation
Book sections

Arabic Text Classification Based on Word and Document Embeddings

Abstract : Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
Complete list of metadata
Contributor : Abdelkader El Mahdaouy Connect in order to contact the contributor
Submitted on : Saturday, October 22, 2016 - 8:30:41 PM
Last modification on : Wednesday, November 3, 2021 - 6:46:40 AM




Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Arabic Text Classification Based on Word and Document Embeddings. Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. ⟨10.1007/978-3-319-48308-5_4⟩. ⟨hal-01386136⟩



Record views