Skip to Main content Skip to Navigation
Book sections

Arabic Text Classification Based on Word and Document Embeddings

Abstract : Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01386136
Contributor : Abdelkader El Mahdaouy <>
Submitted on : Saturday, October 22, 2016 - 8:30:41 PM
Last modification on : Monday, April 20, 2020 - 11:24:02 AM

Identifiers

Collections

CNRS | LIG | UGA

Citation

Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Arabic Text Classification Based on Word and Document Embeddings. Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. ⟨10.1007/978-3-319-48308-5_4⟩. ⟨hal-01386136⟩

Share

Metrics

Record views

449