Dynamic Extension of ASR Lexicon Using Wikipedia Data

Badr Abdullah 1 Irina Illina 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Despite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology.
Type de document :
Communication dans un congrès
IEEE Workshop on Spoken and Language Technology (SLT), Dec 2018, Athènes, Greece. 2018, Proceedings of IEEE SLT
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01874495
Contributeur : Irina Illina <>
Soumis le : vendredi 14 septembre 2018 - 13:11:57
Dernière modification le : mardi 18 décembre 2018 - 16:38:02
Document(s) archivé(s) le : samedi 15 décembre 2018 - 15:20:47

Fichier

Abdullah.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01874495, version 1

Citation

Badr Abdullah, Irina Illina, Dominique Fohr. Dynamic Extension of ASR Lexicon Using Wikipedia Data. IEEE Workshop on Spoken and Language Technology (SLT), Dec 2018, Athènes, Greece. 2018, Proceedings of IEEE SLT. 〈hal-01874495〉

Partager

Métriques

Consultations de la notice

81

Téléchargements de fichiers

63