Language Model Data Augmentation for Keyword Spotting
Résumé
This research extends our earlier work on using machine
translation (MT) and word-based recurrent neural networks to
augment language model training data for keyword search in
conversational Cantonese speech. MT-based data augmenta-
tion is applied to two language pairs: English-Lithuanian and
English-Amharic. Using filtered N-best MT hypotheses for lan-
guage modeling is found to perform better than just using the 1-
best translation. Target language texts collected from the Web
and filtered to select conversational-like data are used in several
manners. In addition to using Web data for training the language
model of the speech recognizer, we further investigate using this
data to improve the language model and phrase table of the MT
system to get better translations of the English data. Finally,
generating text data with a character-based recurrent neural net-
work is investigated. This approach allows new word forms to
be produced, providing a way to reduce the out-of-vocabulary
rate and thereby improve keyword spotting performance. We
study how these different methods of language model data aug-
mentation impact speech-to-text and keyword spotting perfor-
mance for the Lithuanian and Amharic languages. The best re-
sults are obtained by combining all of the explored methods.