A Comparison of Normalization Techniques Applied to Latent Space Representations for Speech Analytics

Abstract : In the context of noisy environments, Automatic Speech Recognition (ASR) systems usually produce poor transcription quality which also negatively impact performance of speech analyt-ics. Various methods have then been proposed to compensate the bad effect of ASR errors, mainly by projecting transcribed words in an abstract space. In this paper, we seek to identify themes from dialogues of telephone conversation services using latent topic-spaces estimated from a latent Dirichlet allocation (LDA). As an outcome, a document can be represented with a vector containing probabilities to be associated to each topic estimated with LDA. This vector should nonetheless be normalized to condition document representations. We propose to compare the original LDA vector representation (without normalization) with two normalization approaches, the Eigen Factor Radial (EFR) and the Feature Warping (FW) methods, already successfully applied in speaker recognition field, but never compared and evaluated in the context of a speech analytic task. Results show the interest of these normalization techniques for theme identification tasks using automatic transcriptions The EFR normalization approach allows a gain of 3.67 and 3.06 points respectively in comparison to the absence of normalization and to the FW normalization technique.
Document type :
Conference papers
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02356373
Contributor : Richard Dufour <>
Submitted on : Thursday, November 14, 2019 - 1:17:17 PM
Last modification on : Friday, November 15, 2019 - 1:26:21 AM

File

a95fb85e4082ca795c374857f347d0...
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-02356373, version 1

Collections

Citation

Mohamed Morchid, Richard Dufour, Driss Matrouf. A Comparison of Normalization Techniques Applied to Latent Space Representations for Speech Analytics. Interspeech 2015, Sep 2015, Dresden, Germany. ⟨hal-02356373⟩

Share

Metrics

Record views

3

Files downloads

6