Skip to Main content Skip to Navigation
Conference papers

A LDA-Based Topic Classification Approach from Highly Imperfect Automatic Transcriptions

Abstract : Although the current transcription systems could achieve high recognition performance, they still have a lot of difficulties to transcribe speech in very noisy environments. The transcription quality has a direct impact on classification tasks using text features. In this paper, we propose to identify themes of telephone conversation services with the classical Term Frequency-Inverse Document Frequency using Gini purity criteria (TF-IDF-Gini) method and with a Latent Dirichlet Allocation (LDA) approach. These approaches are coupled with a Support Vector Machine (SVM) classification to resolve theme identification problem. Results show the effectiveness of the proposed LDA-based method compared to the classical TF-IDF-Gini approach in the context of highly imperfect automatic transcriptions. Finally , we discuss the impact of discriminative and non-discriminative words extracted by both methods in terms of transcription accuracy.
Document type :
Conference papers
Complete list of metadata
Contributor : bibliothèque Universitaire Déposants HAL-Avignon Connect in order to contact the contributor
Submitted on : Monday, May 23, 2016 - 8:45:02 AM
Last modification on : Tuesday, March 22, 2022 - 2:40:01 PM


  • HAL Id : hal-01319771, version 1



Mohamed Morchid, Richard Dufour, Georges Linarès. A LDA-Based Topic Classification Approach from Highly Imperfect Automatic Transcriptions. LREC, May 2014, Reykjavik, Iceland. ⟨hal-01319771⟩



Record views