Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents

Résumé

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural networks have been successfully applied to a variety of speech recognition tasks. In this paper, we investigate whether neural networks can enhance word representation in vector space for the vocabulary extension of a speech recognition system. This is achieved by using high-quality word vector representation of words from large amounts of unstructured text data proposed by Mikolov. This model allows to take into account lexical and semantic word relationships. Proposed methodology is evaluated in the context of broadcast news transcription. Obtained recall and ASR proper name error rate is compared to that obtained using cosine-based vector space methodology. Experimental results show a good ability of the proposed model to capture semantic and lexical information
Fichier non déposé

Dates et versions

hal-01184951 , version 1 (18-08-2015)

Identifiants

  • HAL Id : hal-01184951 , version 1

Citer

Dominique Fohr, Irina Illina. Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents. Interspeech 2015, Sep 2015, Dresden, Germany. ⟨hal-01184951⟩
269 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More