HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Using closely-related language to build an ASR for a very under-resourced language: Iban

Abstract : This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, Iban, a language that is mainly spoken in Sarawak, Malaysia. We collected 8 hours of data to begin this study due to no resources for ASR exist. We employed bootstrapping techniques involving a closely-related language for rapidly building and improve an Iban system. First, we used already available data from Malay, a local dominant language in Malaysia, to bootstrap grapheme-to-phoneme system (G2P) for the target language. We also built various types of G2Ps, including a grapheme-based and an English G2P, to produce different versions of dictionaries. We tested all of the dictionaries on the Iban ASR to provide us the best version. Second, we improved the baseline GMM system word error rate (WER) result by utilizing subspace Gaussian mixture models (SGMM). To test, we set two levels of data sparseness on Iban data; 7 hours and 1 hour transcribed speech. We investigated cross-lingual SGMM where the shared parameters were obtained either in monolingual or multilingual fashion and then applied to the target language for training. Experiments on out-of-language data, English and Malay, as source languages result in lower WERs when Iban data is very limited.
Document type :
Conference papers
Complete list of metadata

Cited literature [28 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01055576
Contributor : Laurent Besacier Connect in order to contact the contributor
Submitted on : Wednesday, August 13, 2014 - 10:14:37 AM
Last modification on : Thursday, October 21, 2021 - 3:47:53 AM
Long-term archiving on: : Wednesday, November 26, 2014 - 11:50:32 PM

File

IS14full_paper-sarah_2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01055576, version 1

Citation

Sarah Samson Juan, Laurent Besacier, Benjamin Lecouteux, Tan Tien Ping. Using closely-related language to build an ASR for a very under-resourced language: Iban. Oriental COCOSDA 2014, Sep 2014, Phuket, Thailand. 5 p. ⟨hal-01055576⟩

Share

Metrics

Record views

189

Files downloads

506