Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

William Hartmann; Viet Bac Le; Abdelkhalek Messaoudi; Lori Lamel; Jean-Luc Gauvain

Communication Dans Un Congrès Année : 2014

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

(1) , , (1) , (1) , (1)

William Hartmann

Fonction : Auteur
PersonId : 1034725

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Viet Bac Le

Fonction : Auteur

Abdelkhalek Messaoudi

Fonction : Auteur
PersonId : 1034373

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Lori Lamel

Fonction : Auteur
PersonId : 15965
IdHAL : lori-lamel
ORCID : 0000-0001-7443-9938
IdRef : 127578056

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Jean-Luc Gauvain

Fonction : Auteur
PersonId : 1034324

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

For languages with limited training resources, out-of- vocabulary (OOV) words are a signiﬁcant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strate- gies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) per- forming a separate decoding for each subword type, and 3) a single decoding using all possible subword units. In these ex- periments, the best performance is achieved by carrying out a separate decoding for each subword type. Further gains are at- tained through system combination. We also ﬁnd that ignor- ing word boundaries improves the detection of OOV keywords without signiﬁcantly impacting in-vocabulary keyword detec- tion. Results are presented on four languages from the IARPA Babel Program (Haitian Creole, Assamese, Bengali, and Zulu).

Mots clés

keyword search spoken term detection OOV sub-word lexical units low resource LVCSR

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01843408

Soumis le : mercredi 18 juillet 2018-16:56:42

Dernière modification le : samedi 7 octobre 2023-21:36:20

Dates et versions

hal-01843408 , version 1 (18-07-2018)

Identifiants

HAL Id : hal-01843408 , version 1

Citer

William Hartmann, Viet Bac Le, Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain. Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. Annual Conference of the International Speech Communication Association , ISCA, Sep 2014, Singapore, Singapore. ⟨hal-01843408⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI SORBONNE-UNIVERSITE LISN

18 Consultations

0 Téléchargements

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager