Skip to Main content Skip to Navigation
Conference papers

Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition?

Abstract : In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for quick language resources collection for the development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate how different views of the text data (word and sub-word units) can be exploited for Khmer language modeling. We propose to work both at the model level (by making hybrid vocabularies with both word and sub-word units) as well as at the ASR output level (by using a simple N-best list voting mechanism). For acoustic modeling, we use basic linguistic rules to automatically generate pronunciation dictionaries based on grapheme and phoneme. An experimental framework is setup to evaluate the performance of each modeling units. Index Terms-ASR, Khmer, word and sub-word units, acoustic modeling, language modeling.
Keywords : Khmer ASR Speech
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download
Contributor : Brigitte Bigi <>
Submitted on : Tuesday, December 13, 2016 - 2:47:00 PM
Last modification on : Friday, July 17, 2020 - 11:10:26 AM
Long-term archiving on: : Tuesday, March 14, 2017 - 11:51:01 AM


Publisher files allowed on an open archive




  • HAL Id : hal-01392526, version 1


Sopheap Seng, Sethserey Sam, Viet-Bac Le, Brigitte Bigi, Laurent Besacier. Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition?. International Workshop on Spoken Languages Technologies for Under-resourced languages, 2008, Hanoi, Vietnam. pp.33-38. ⟨hal-01392526⟩



Record views


Files downloads