How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Imran Sheikh; Irina Illina; Dominique Fohr

Communication Dans Un Congrès Année : 2016

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

(1) , (1) , (1)

Imran Sheikh

Fonction : Auteur
PersonId : 968903

Speech Modeling for Facilitating Oral-Based Communication

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first present our diachronic French broadcast news datasets, which highlight the motivation of our study on OOV PNs. Then the effect of using diachronic text data from different sources and a different time span is analysed. With OOV PN retrieval experiments on French broadcast news videos, we conclude that a diachronic corpus with text from different sources leads to better retrieval performance than one relying on text from single source or from a longer time span.

Mots clés

proper names oov lvcsr diachronic corpus

Domaines

Interface homme-machine [cs.HC]

Fichier principal

draft_7Mar2016.pdf (138.06 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Dominique Fohr : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01331714

Soumis le : jeudi 20 octobre 2016-09:56:02

Dernière modification le : lundi 11 septembre 2023-17:41:19

Dates et versions

hal-01331714 , version 1 (20-10-2016)

Identifiants

HAL Id : hal-01331714 , version 1

Citer

Imran Sheikh, Irina Illina, Dominique Fohr. How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News. LREC 2016, May 2016, Portoroz, Slovenia. ⟨hal-01331714⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD ANR

178 Consultations

148 Téléchargements

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager