Automatic speech recognition system for Tunisian dialect

Abir Masmoudi; Fethi Bougares; Mariem Ellouze; Yannick Estève; Lamia Belguith

doi:10.1007/s10579-017-9402-y

Article Dans Une Revue Language Resources and Evaluation Année : 2018

Automatic speech recognition system for Tunisian dialect

(1) , (2) , (1) , (2) , (1)

1
2

Abir Masmoudi

Fonction : Auteur

Multimedia, InfoRmation systems and Advanced Computing Laboratory

Fethi Bougares

Fonction : Auteur
PersonId : 768825
IdRef : 170400883

Laboratoire d'Informatique de l'Université du Mans

Mariem Ellouze

Fonction : Auteur
PersonId : 969838

Multimedia, InfoRmation systems and Advanced Computing Laboratory

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire d'Informatique de l'Université du Mans

Lamia Belguith

Fonction : Auteur

Multimedia, InfoRmation systems and Advanced Computing Laboratory

Résumé

Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.

Mots clés

Tunisian dialect Automatic speech recognition Under-resourced language Rule-based Grapheme-to-phoneme conversion

Domaines

Informatique et langage [cs.CL]

Fichier principal

Journal_20082017 final.pdf (542.95 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Yannick Estève : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01592416

Soumis le : vendredi 29 juin 2018-11:32:43

Dernière modification le : mardi 7 novembre 2023-15:00:02

Archivage à long terme le : jeudi 27 septembre 2018-08:23:15

Dates et versions

hal-01592416 , version 1 (29-06-2018)

Identifiants

HAL Id : hal-01592416 , version 1
DOI : 10.1007/s10579-017-9402-y

Citer

Abir Masmoudi, Fethi Bougares, Mariem Ellouze, Yannick Estève, Lamia Belguith. Automatic speech recognition system for Tunisian dialect . Language Resources and Evaluation, 2018, 52 (1), pp.249-267. ⟨10.1007/s10579-017-9402-y⟩. ⟨hal-01592416⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS LIUM LIUM-LST

455 Consultations

1464 Téléchargements

Automatic speech recognition system for Tunisian dialect

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager