Skip to Main content Skip to Navigation
Conference papers

Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition

Abstract : This paper introduces approaches based on vocal tract length normalisation (VTLN) techniques for hybrid deep neural network (DNN)-hidden Markov model (HMM) automatic speech recognition when targeting children's and adults' speech. VTLN is investigated by training a DNN-HMM system by using first mel frequency cepstral coefficients (MFCCs) normalised with standard VTLN. Then, MFCCs derived acoustic features are combined with the VTLN warping factors to obtain an augmented set of features as input to a DNN. In this later, novel, approach the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when standard VTLN approach requires two decoding passes. Both VTLN-based approaches are shown to improve phone error rate performance, up to 20% relative improvement, compared to a baseline trained on a mixture of children's and adults' speech.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01393972
Contributor : Romain Serizel <>
Submitted on : Wednesday, November 9, 2016 - 2:52:36 PM
Last modification on : Friday, July 31, 2020 - 10:44:09 AM
Long-term archiving on: : Tuesday, March 14, 2017 - 7:36:46 PM

File

14-3.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Romain Serizel, Diego Giuliani. Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition. 2014 IEEE Spoken Language Technology Workshop (SLT 2014), Dec 2014, South Lake Tahoe, CA, United States. pp.135-140, ⟨10.1109/SLT.2014.7078563⟩. ⟨hal-01393972⟩

Share

Metrics

Record views

202

Files downloads

581