Document Level Semantic Context for Retrieving OOV Proper Names

Imran Sheikh 1 Irina Illina 1 Dominique Fohr 1 Georges Linares 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Recognition of Proper Names (PNs) in speech is important for content based indexing and browsing of audio-video data. However, many PNs are Out-Of-Vocabulary (OOV) words nfor LVCSR systems used in these applications due to the diachronic nature of data. By exploiting semantic context of the audio, relevant OOV PNs can be retrieved and then the target PNs can be recovered. To retrieve OOV PNs, we propose to represent their context with document level semantic vectors; and show that this approach is able to handle less frequent OOV PNs in the training data. We study different representations, including Random Projections, LSA, LDA, Skip-gram, CBOW and GloVe. A further evaluation of recovery of target OOV PNs using a phonetic search shows that document level semantic context is reliable for recovery of OOV PNs.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [27 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01331716
Contributor : Dominique Fohr <>
Submitted on : Thursday, October 20, 2016 - 9:58:51 AM
Last modification on : Wednesday, April 3, 2019 - 1:22:59 AM

File

draft-16Jan16 (1).pdf
Files produced by the author(s)

Identifiers

Citation

Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linares. Document Level Semantic Context for Retrieving OOV Proper Names. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Mar 2016, Shanghai, China. pp.6050-6054, ⟨10.1109/ICASSP.2016.7472839⟩. ⟨hal-01331716⟩

Share

Metrics

Record views

432

Files downloads

200