Improvement to a NAM-captured whisper-to-speech system - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2010

Improvement to a NAM-captured whisper-to-speech system

Résumé

Exploiting a tissue-conductive sensor - a stethoscopic microphone - the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and because of impoverished phonetic contrasts. This paper presents our investigations to improve the intelligibility and naturalness of the synthesized speech and first objective and subjective evaluations of the resulting system. The first improvement concerns voicing and F0 estimation. Instead of using a single GMM for both, we estimate a continuous F0 using a GMM, trained on target voiced segments only. The continuous F0 estimation is filtered by a voicing decision computed by a neural network. The objective and subjective improvement is significant. The second improvement concerns the input time window and its dimensionality reduction: we show that the precision of F0 estimation is also significantly improved by extending the input time window from 90 to 450ms and by using a Linear Discriminant Analysis (LDA) instead of the original Principal Component Analysis (PCA). Estimation of spectral envelope is also slightly improved with LDA but is degraded with larger time windows. A third improvement consists in adding visual parameters both as input and output parameters. The positive contribution of this information is confirmed by a subjective test. Finally, HMM-based conversion is compared with GMM-based conversion
Fichier principal
Vignette du fichier
SpeechComm_Tran_etal_revision.pdf (922.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00459973 , version 1 (25-02-2010)

Identifiants

  • HAL Id : hal-00459973 , version 1

Citer

Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda. Improvement to a NAM-captured whisper-to-speech system. Speech Communication, 2010, 52 (4), pp.314-326. ⟨hal-00459973⟩
261 Consultations
342 Téléchargements

Partager

Gmail Facebook X LinkedIn More