Arabic Text Classification Based on Word and Document Embeddings

Abdelkader El Mahdaouy; Eric Gaussier; Saïd El Alaoui Ouatik

doi:10.1007/978-3-319-48308-5_4

Chapitre D'ouvrage Année : 2016

Arabic Text Classification Based on Word and Document Embeddings

(1) , (2, 3, 1) , (4)

1
2
3
4

Abdelkader El Mahdaouy

Fonction : Auteur
PersonId : 14726
IdHAL : el-mahdaouy
ORCID : 0000-0003-4281-2472
IdRef : 230478298

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Eric Gaussier

Fonction : Auteur
PersonId : 182833
IdHAL : eric-gaussier
ORCID : 0000-0002-8858-3233
IdRef : 074308297

Université Grenoble Alpes [2016-2019]

Laboratoire d'Informatique de Grenoble

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Saïd El Alaoui Ouatik

Fonction : Auteur
PersonId : 947979

laboratoire informatique et modélisation

Résumé

Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.

Mots clés

Doc2vec Arabic text classification Arabic natural language processing Document embeddings Word embeddings SKIP-Gram Continuous Bag-of-Word Glove

Domaines

Recherche d'information [cs.IR] Traitement du texte et du document

Abdelkader El Mahdaouy : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01386136

Soumis le : samedi 22 octobre 2016-20:30:41

Dernière modification le : jeudi 4 avril 2024-20:54:21

Dates et versions

hal-01386136 , version 1 (22-10-2016)

Identifiants

HAL Id : hal-01386136 , version 1
DOI : 10.1007/978-3-319-48308-5_4

Citer

Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Arabic Text Classification Based on Word and Document Embeddings. Advances in Intelligent Systems and Computing , 533, Springer International Publishing, pp 32-41, 2016, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, 978-3-319-48307-8. ⟨10.1007/978-3-319-48308-5_4⟩. ⟨hal-01386136⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_SIDCH LIG_SIDCH_APTIKAL

409 Consultations

0 Téléchargements

Arabic Text Classification Based on Word and Document Embeddings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager