Comparing decoding strategies for subword-based keyword spotting in low-resourced languages - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

Résumé

For languages with limited training resources, out-of- vocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strate- gies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) per- forming a separate decoding for each subword type, and 3) a single decoding using all possible subword units. In these ex- periments, the best performance is achieved by carrying out a separate decoding for each subword type. Further gains are at- tained through system combination. We also find that ignor- ing word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detec- tion. Results are presented on four languages from the IARPA Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).
Fichier non déposé

Dates et versions

hal-01843408 , version 1 (18-07-2018)

Identifiants

  • HAL Id : hal-01843408 , version 1

Citer

William Hartmann, Viet Bac Le, Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain. Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. Annual Conference of the International Speech Communication Association , ISCA, Sep 2014, Singapore, Singapore. ⟨hal-01843408⟩
18 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More