Development of a Korean speech recognition system with little annontated data
Résumé
This paper investigates the development of a speech-to-text transcription system for the Korean language in the context of the DGA RAPID Rapmat project. Korean is an alpha-syllabary language spoken by about 78 million people worldwide. As only a small amount of manually transcribed audio data were available, the acoustic models were trained on audio data downloaded from several Korean websites in an unsupervised manner, and the language models were trained on web texts. The reported word and character error rates are estimates, as development corpus used in these experiments was also constructed from the untranscribed audio data, the web texts and automatic transcriptions. Several variants for unsupervised acoustic model training were compared to assess the influence of the vocabulary size (200k vs 2M), the type of language model (words vs characters), the acoustic unit (phonemes vs half-syllables), as well as incremental batch vs iterative decoding of the untranscribed audio corpus.