Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents

Dominique Fohr 1, 2 Irina Illina 3, 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
3 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. One approach for increasing the vocabulary coverage of a speech transcription system is to automatically retrieve new proper names from contemporary diachronic text documents. In recent years, neural networks have been successfully applied to a variety of speech recognition tasks. In this paper, we investigate whether neural networks can enhance word representation in vector space for the vocabulary extension of a speech recognition system. This is achieved by using high-quality word vector representation of words from large amounts of unstructured text data proposed by Mikolov. This model allows to take into account lexical and semantic word relationships. Proposed methodology is evaluated in the context of broadcast news transcription. Obtained recall and ASR proper name error rate is compared to that obtained using cosine-based vector space methodology. Experimental results show a good ability of the proposed model to capture semantic and lexical information
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01184951
Contributor : Dominique Fohr <>
Submitted on : Tuesday, August 18, 2015 - 4:15:07 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM

Identifiers

  • HAL Id : hal-01184951, version 1

Collections

Citation

Dominique Fohr, Irina Illina. Continuous Word Representation using Neural Networks for Proper Name Retrieval from Diachronic Documents. Interspeech 2015, Sep 2015, Dresden, Germany. ⟨hal-01184951⟩

Share

Metrics

Record views

417