Confidence measure for speech indexing based on Latent Dirichlet Allocation

Abstract : This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%. Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01320330
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Monday, May 23, 2016 - 4:39:45 PM
Last modification on : Saturday, March 23, 2019 - 1:22:11 AM

Identifiers

  • HAL Id : hal-01320330, version 1

Collections

Citation

Grégory Senay, Georges Linarès. Confidence measure for speech indexing based on Latent Dirichlet Allocation. INTERSPEECH, Sep 2012, Portland, United States. ⟨hal-01320330⟩

Share

Metrics

Record views

34