Skip to Main content Skip to Navigation
Journal articles

A study of continuous space word and sentence representations applied to ASR error detection

Abstract : This paper presents a study of continuous word representations applied to automatic detection of speech recognition errors. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. We explore the use of several types of word representations: simple and combined linguistic embeddings, and acoustic ones associated to prosodic features, extracted from the audio signal. To compensate certain phenomena highlighted by the analysis of the error average span, we propose to model the errors at the sentence level through the use of sentence embeddings. An approach to build continuous sentence representations dedicated to ASR error detection is also proposed and compared to the Doc2vec approach. Experiments are performed on automatic transcriptions generated by the LIUM ASR system applied to the French ETAPE corpus. They show that the combination of linguistic embeddings, acoustic embeddings, prosodic features, and sentence embeddings in addition to more classical features yields very competitive results. Particularly, these results show the complementarity of acoustic embeddings and prosodic information, and show that the proposed sentence em-beddings dedicated to ASR error detection achieve better results than generic sentence embeddings.
Complete list of metadatas

Cited literature [49 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02501943
Contributor : Sahar Ghannay <>
Submitted on : Sunday, March 8, 2020 - 4:08:37 PM
Last modification on : Thursday, March 12, 2020 - 3:34:46 PM

File

A_study_of_continuous_space_wo...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02501943, version 1

Collections

Citation

Sahar Ghannay, Yannick Estève, Nathalie Camelin. A study of continuous space word and sentence representations applied to ASR error detection. Speech Communication, Elsevier : North-Holland, 2020. ⟨hal-02501943⟩

Share

Metrics

Record views

32

Files downloads

43