Using closely-related language to build an ASR for a very under-resourced language: Iban

Abstract : This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, Iban, a language that is mainly spoken in Sarawak, Malaysia. We collected 8 hours of data to begin this study due to no resources for ASR exist. We employed bootstrapping techniques involving a closely-related language for rapidly building and improve an Iban system. First, we used already available data from Malay, a local dominant language in Malaysia, to bootstrap grapheme-to-phoneme system (G2P) for the target language. We also built various types of G2Ps, including a grapheme-based and an English G2P, to produce different versions of dictionaries. We tested all of the dictionaries on the Iban ASR to provide us the best version. Second, we improved the baseline GMM system word error rate (WER) result by utilizing subspace Gaussian mixture models (SGMM). To test, we set two levels of data sparseness on Iban data; 7 hours and 1 hour transcribed speech. We investigated cross-lingual SGMM where the shared parameters were obtained either in monolingual or multilingual fashion and then applied to the target language for training. Experiments on out-of-language data, English and Malay, as source languages result in lower WERs when Iban data is very limited.
Type de document :
Communication dans un congrès
Oriental COCOSDA 2014, Sep 2014, Phuket, Thailand. 5 p., 2014
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01055576
Contributeur : Laurent Besacier <>
Soumis le : mercredi 13 août 2014 - 10:14:37
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 23:50:32

Fichier

IS14full_paper-sarah_2.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01055576, version 1

Citation

Sarah Samson Juan, Laurent Besacier, Benjamin Lecouteux, Tan Tien Ping. Using closely-related language to build an ASR for a very under-resourced language: Iban. Oriental COCOSDA 2014, Sep 2014, Phuket, Thailand. 5 p., 2014. 〈hal-01055576〉

Partager

Métriques

Consultations de la notice

272

Téléchargements de fichiers

521