Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
Résumé
For languages with limited training resources, out-of-
vocabulary (OOV) words are a significant problem, both for
transcription and keyword spotting. This paper investigates the
use of subword lexical units for keyword spotting. Three strate-
gies for using the sub-word units are explored: 1) converting
word-based lattices to subword lattices after decoding, 2) per-
forming a separate decoding for each subword type, and 3) a
single decoding using all possible subword units. In these ex-
periments, the best performance is achieved by carrying out a
separate decoding for each subword type. Further gains are at-
tained through system combination. We also find that ignor-
ing word boundaries improves the detection of OOV keywords
without significantly impacting in-vocabulary keyword detec-
tion. Results are presented on four languages from the IARPA
Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).