Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

Irina Illina 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01240480
Contributor : Dominique Fohr <>
Submitted on : Thursday, December 10, 2015 - 9:51:49 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Document(s) archivé(s) le : Saturday, April 29, 2017 - 8:13:56 AM

File

ltc-009-illina.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01240480, version 1

Collections

Citation

Irina Illina, Dominique Fohr. Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents. LTC Language & Technology Conference, Nov 2015, Poznan, Poland. pp.120-124. ⟨hal-01240480⟩

Share

Metrics

Record views

344

Files downloads

153