Skip to Main content Skip to Navigation
Conference papers

Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning

Abstract : Previous work has shown that end-to-end neural-based speech recognition systems can be improved by adding auxiliary tasks at intermediate layers. In this paper, we report multitask learning (MTL) experiments in the context of connectionist temporal classification (CTC) based speech recognition at character level. We compare several MTL architectures that jointly learn to predict characters (sometimes called graphemes) and consonant/vowel (CV) binary labels. The best approach, which we call Char+CV-CTC, adds up the character and CV logits to obtain the final character predictions. The idea is to put more weight on the vowel (consonant) characters when the vowel (consonant) symbol ‘V’ (‘C’) is predicted in the auxiliary-task branch of the network. Experiments were carried out on the Wall Street Journal (WSJ) corpus. Char+CV-CTC achieved the best ASR results with a 2.2% Character Error Rate and a 6.1% Word Error Rate (WER) on the Eval92 evaluation subset. This model outperformed its monotask model counterpart by 0.7% absolute in WER and also achieved almost the same performance of 6.0% as a strong baseline phone-based Time Delay Neural Network (“TDNN-Phone+TR2”) model.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02419431
Contributor : Open Archive Toulouse Archive Ouverte (oatao) Connect in order to contact the contributor
Submitted on : Thursday, December 19, 2019 - 2:41:40 PM
Last modification on : Wednesday, November 3, 2021 - 6:51:41 AM
Long-term archiving on: : Friday, March 20, 2020 - 6:31:43 PM

File

heba_25028.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02419431, version 1
  • OATAO : 25028

Citation

Abdelwahab Heba, Thomas Pellegrini, Jean-Pierre Lorré, Régine André-Obrecht. Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning. 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), Sep 2019, Graz, Austria. pp.1611-1615. ⟨hal-02419431⟩

Share

Metrics

Record views

237

Files downloads

322