Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition?

Abstract : In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for quick language resources collection for the development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate how different views of the text data (word and sub-word units) can be exploited for Khmer language modeling. We propose to work both at the model level (by making hybrid vocabularies with both word and sub-word units) as well as at the ASR output level (by using a simple N-best list voting mechanism). For acoustic modeling, we use basic linguistic rules to automatically generate pronunciation dictionaries based on grapheme and phoneme. An experimental framework is setup to evaluate the performance of each modeling units. Index Terms-ASR, Khmer, word and sub-word units, acoustic modeling, language modeling.
Keywords : Khmer ASR Speech
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01392526
Contributor : Brigitte Bigi <>
Submitted on : Tuesday, December 13, 2016 - 2:47:00 PM
Last modification on : Monday, July 8, 2019 - 3:08:37 PM
Long-term archiving on : Tuesday, March 14, 2017 - 11:51:01 AM

File

seng2008sltu.pdf
Publisher files allowed on an open archive

Licence


Copyright

Identifiers

  • HAL Id : hal-01392526, version 1

Citation

Sopheap Seng, Sethserey Sam, Viet-Bac Le, Brigitte Bigi, Laurent Besacier. Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition?. International Workshop on Spoken Languages Technologies for Under-resourced languages, 2008, Hanoi, Vietnam. pp.33-38. ⟨hal-01392526⟩

Share

Metrics

Record views

181

Files downloads

215