Confidence measure for speech indexing based on Latent Dirichlet Allocation

Grégory Senay; Georges Linarès

Communication Dans Un Congrès Année : 2012

Confidence measure for speech indexing based on Latent Dirichlet Allocation

(1) , (1)

Grégory Senay

Fonction : Auteur correspondant
PersonId : 982565

Connectez-vous pour contacter l'auteur

Laboratoire Informatique d'Avignon

Georges Linarès

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Résumé

This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%. Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation

Domaines

Informatique [cs]

bibliothèque Universitaire Déposants HAL-Avignon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01320330

Soumis le : lundi 23 mai 2016-16:39:45

Dernière modification le : mardi 22 mars 2022-14:40:01

Dates et versions

hal-01320330 , version 1 (23-05-2016)

Identifiants

HAL Id : hal-01320330 , version 1

Citer

Grégory Senay, Georges Linarès. Confidence measure for speech indexing based on Latent Dirichlet Allocation. INTERSPEECH, Sep 2012, Portland, United States. ⟨hal-01320330⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

33 Consultations

0 Téléchargements

Confidence measure for speech indexing based on Latent Dirichlet Allocation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager