Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban

Abstract : This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech.
Type de document :
Communication dans un congrès
Interspeech 2015, Sep 2015, Dresden, Germany. 2015
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01170493
Contributeur : Laurent Besacier <>
Soumis le : mardi 15 septembre 2015 - 17:04:35
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : lundi 28 décembre 2015 - 22:33:47

Fichier

IS2015_samsonjuan_camera-ready...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01170493, version 1

Collections

Citation

Sarah Samson, Laurent Besacier, Benjamin Lecouteux, Mohamed Dyab. Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban. Interspeech 2015, Sep 2015, Dresden, Germany. 2015. 〈hal-01170493〉

Partager

Métriques

Consultations de la notice

344

Téléchargements de fichiers

282