Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect

Mohamed Menacer 1 Odile Mella 2 Dominique Fohr 2 Denis Jouvet 2 David Langlois 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper addresses the development of an Automatic Speech Recognition system for Modern Standard Arabic (MSA) and its extension to Algerian dialect. Algerian dialect is very different from Arabic dialects of the Middle-East, since it is highly influenced by the French language. In this article, we start by presenting the new automatic speech recognition named ALASR (Arabic Loria Automatic Speech Recognition) system. The acoustic model of ALASR is based on a DNN approach and the language model is a classical n-gram. Several options are investigated in this paper to find the best combination of models and parameters. ALASR achieves good results for MSA in terms of WER (14.02%), but it completely collapses on an Algerian dialect data set of 70 minutes (a WER of 89%). In order to take into account the impact of the French language, on the Algerian dialect, we combine in ALASR two acoustic models, the original one (MSA) and a French one trained on ESTER corpus. This solution has been adopted because no transcribed speech data for Algerian dialect are available. This combination leads to a substantial absolute reduction of the word error of 24%. c 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 3rd International Conference on Arabic Computational Linguistics .
Type de document :
Communication dans un congrès
ACLing 2017 - 3rd International Conference on Arabic Computational Linguistics, Nov 2017, Dubai, United Arab Emirates. pp.1-8, 2017
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01583842
Contributeur : Kamel Smaïli <>
Soumis le : vendredi 8 septembre 2017 - 07:09:26
Dernière modification le : mardi 18 décembre 2018 - 16:38:02

Fichier

ACLing2017_22_MenacerMellaFohr...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité - Pas d'utilisation commerciale - Pas de modification 4.0 International License

Identifiants

  • HAL Id : hal-01583842, version 1

Citation

Mohamed Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, et al.. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect. ACLing 2017 - 3rd International Conference on Arabic Computational Linguistics, Nov 2017, Dubai, United Arab Emirates. pp.1-8, 2017. 〈hal-01583842〉

Partager

Métriques

Consultations de la notice

1003

Téléchargements de fichiers

274