Skip to Main content Skip to Navigation
New interface
Conference papers

Spoken Language Understanding in a Latent Topic-based Subspace

Abstract : Performance of spoken language understanding applications declines when spoken documents are automatically transcribed in noisy conditions due to high Word Error Rates (WER). To improve the robustness to transcription errors, recent solutions propose to map these automatic transcriptions in a latent space. These studies have proposed to compare classical topic-based representations such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. An original compact representation, called c-vector, has recently been introduced to walk around the tricky choice of the number of latent topics in these topic-based representations. Moreover, c-vectors allow to increase the robustness of document classification with respect to transcription errors by compacting different LDA representations of a same speech document in a reduced space and then compensate most of the noise of the document representation. The main drawback of this method is the number of sub-tasks needed to build the c-vector space. This paper proposes to both improve this compact representation (c-vector) of spoken documents and to reduce the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Sub-space" (LTS). In comparison to LDA, the AT model considers not only the dialogue content (words), but also the class related to the document. Experiments are conducted on the DECODA corpus containing speech conversations from the call-center of the RATP Paris transportation company. Results show that the original LTS representation outperforms the best previous compact representation (c-vector), with a substantial gain of more than 2.5% in terms of correctly labeled conversations.
Document type :
Conference papers
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : Richard Dufour Connect in order to contact the contributor
Submitted on : Thursday, November 14, 2019 - 1:17:46 PM
Last modification on : Friday, November 12, 2021 - 11:18:05 AM
Long-term archiving on: : Saturday, February 15, 2020 - 12:50:04 PM


Publisher files allowed on an open archive




Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet Bousquet, et al.. Spoken Language Understanding in a Latent Topic-based Subspace. Interspeech 2016, Sep 2016, San Francisco, United States. ⟨10.21437/Interspeech.2016-50⟩. ⟨hal-02356390⟩



Record views


Files downloads