An enhanced automatic speech recognition system for Arabic

Automatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition , it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right hamoza above or below Alif.

Mots clés

Arabic Speech recognition Language model Arabic processing

Domaines

Informatique et langage [cs.CL]

Fichier principal

ArticleEACL2017VF.pdf (181.54 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Kamel Smaïli : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01531588

Soumis le : jeudi 1 juin 2017-19:27:51

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : mercredi 13 décembre 2017-07:12:08

Dates et versions

hal-01531588 , version 1 (01-06-2017)

Identifiants

HAL Id : hal-01531588 , version 1

Citer

Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, et al.. An enhanced automatic speech recognition system for Arabic. The third Arabic Natural Language Processing Workshop - EACL 2017, Apr 2017, Valencia, Spain. ⟨hal-01531588⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

714 Consultations

623 Téléchargements