ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

Abstract : Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition , we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02150003
Contributor : Didier Schwab <>
Submitted on : Thursday, June 6, 2019 - 8:32:01 PM
Last modification on : Saturday, July 6, 2019 - 3:29:08 PM

File

Lachraf-el-al-WANLP.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02150003, version 1

Collections

LIG | UGA

Citation

Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, Didier Schwab. ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model. The Fourth Arabic Natural Language Processing Workshop, Jul 2019, Florence, Italy. ⟨hal-02150003⟩

Share

Metrics

Record views

110

Files downloads

142