Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect

Mohamed Menacer 1 Odile Mella 2 Dominique Fohr 2 Denis Jouvet 2 David Langlois 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper addresses the development of an Automatic Speech Recognition system for Modern Standard Arabic (MSA) and its extension to Algerian dialect. Algerian dialect is very different from Arabic dialects of the Middle-East, since it is highly influenced by the French language. In this article, we start by presenting the new automatic speech recognition named ALASR (Arabic Loria Automatic Speech Recognition) system. The acoustic model of ALASR is based on a DNN approach and the language model is a classical n-gram. Several options are investigated in this paper to find the best combination of models and parameters. ALASR achieves good results for MSA in terms of WER (14.02%), but it completely collapses on an Algerian dialect data set of 70 minutes (a WER of 89%). In order to take into account the impact of the French language, on the Algerian dialect, we combine in ALASR two acoustic models, the original one (MSA) and a French one trained on ESTER corpus. This solution has been adopted because no transcribed speech data for Algerian dialect are available. This combination leads to a substantial absolute reduction of the word error of 24%. c 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 3rd International Conference on Arabic Computational Linguistics .
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01583842
Contributor : Kamel Smaïli <>
Submitted on : Friday, September 8, 2017 - 7:09:26 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM

File

ACLing2017_22_MenacerMellaFohr...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : hal-01583842, version 1

Citation

Mohamed Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, et al.. Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect. ACLing 2017 - 3rd International Conference on Arabic Computational Linguistics, Nov 2017, Dubai, United Arab Emirates. pp.1-8. ⟨hal-01583842⟩

Share

Metrics

Record views

1140

Files downloads

299