Confidence measure for speech indexing based on Latent Dirichlet Allocation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Confidence measure for speech indexing based on Latent Dirichlet Allocation

Résumé

This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%. Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation
Fichier non déposé

Dates et versions

hal-01320330 , version 1 (23-05-2016)

Identifiants

  • HAL Id : hal-01320330 , version 1

Citer

Grégory Senay, Georges Linarès. Confidence measure for speech indexing based on Latent Dirichlet Allocation. INTERSPEECH, Sep 2012, Portland, United States. ⟨hal-01320330⟩

Collections

UNIV-AVIGNON LIA
33 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More