An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings

Résumé

The goal of this work consists in building automatically from a social network (Youtube) an Algerian dialect lexicon. Each entry of this lexicon is composed by a word, written in Arabic script (modern standard Arabic or dialect) or Latin script (Arabizi, French or English). To each word, several transliterations are proposed, written in a script different from the one used for the word itself. To do that, we harvested and aligned an Algerian dialect corpus by using an iterative method based on multlingual word embeddings representation. The multlinguality in the corpus is due to the fact that Algerian people use several languages to post comments in social networks: Modern Standard Arabic (MSA), Algerian dialect, French and sometimes English. In addition, the users of social networks write freely without any regard to the grammar of these languages. We tested the proposed method on a test lexicon, it leads to a score of 73% in terms of F-measure.
Fichier principal
Vignette du fichier
KarimaKamelLREC.pdf (283.47 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01718110 , version 1 (27-02-2018)

Identifiants

  • HAL Id : hal-01718110 , version 1

Citer

Karima Abidi, Kamel Smaïli. An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings. 11th edition of the Language Resources and Evaluation Conference, LREC 2018, May 2018, Miyazaki, Japan. ⟨hal-01718110⟩
280 Consultations
719 Téléchargements

Partager

Gmail Facebook X LinkedIn More