Arabic Text Classification Based on Word and Document Embeddings

Abstract : Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
Type de document :
Chapitre d'ouvrage
Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. <10.1007/978-3-319-48308-5_4>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01386136
Contributeur : Abdelkader El Mahdaouy <>
Soumis le : samedi 22 octobre 2016 - 20:30:41
Dernière modification le : lundi 12 décembre 2016 - 14:56:11

Identifiants

Collections

Citation

Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Arabic Text Classification Based on Word and Document Embeddings. Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. <10.1007/978-3-319-48308-5_4>. <hal-01386136>

Partager

Métriques

Consultations de la notice

91