Automatic speech recognition system for Tunisian dialect

Abstract : Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.
Type de document :
Article dans une revue
Language Resources and Evaluation, Springer Verlag, 2018, 52 (1), pp.249-267. 〈10.1007/s10579-017-9402-y〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01592416
Contributeur : Yannick Estève <>
Soumis le : vendredi 29 juin 2018 - 11:32:43
Dernière modification le : mardi 10 juillet 2018 - 18:52:24
Document(s) archivé(s) le : jeudi 27 septembre 2018 - 08:23:15

Fichier

Journal_20082017 final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Abir Masmoudi, Fethi Bougares, Mariem Ellouze, Yannick Estève, Lamia Belguith. Automatic speech recognition system for Tunisian dialect . Language Resources and Evaluation, Springer Verlag, 2018, 52 (1), pp.249-267. 〈10.1007/s10579-017-9402-y〉. 〈hal-01592416〉

Partager

Métriques

Consultations de la notice

164

Téléchargements de fichiers

155