Predicting F0 and voicing from NAM-captured whispered speech - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Predicting F0 and voicing from NAM-captured whispered speech

Résumé

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F0 of the transformed voice from unvoiced speech. In this paper, we propose a method to improve F0 estimation and voicing decision in a NAM-to-speech conversion system based on Gaussian Mixture Models (GMM) applied to whispered speech. Instead of combining voicing decision and F0 estimation in a single GMM, a simple feed-forward neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments. The error rate for the voiced/unvoiced decision of the network is 6.8% compared to 9.2% with the original system. Our proposal benefits also to F0 estimation error.
Fichier principal
Vignette du fichier
vat_SP08.pdf (739.81 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00333290 , version 1 (22-10-2008)

Identifiants

  • HAL Id : hal-00333290 , version 1

Citer

Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda. Predicting F0 and voicing from NAM-captured whispered speech. Speech Prosody 2008 - 4th International Conference on Speech Prosody, May 2008, Campinas, Brazil. pp.107-110. ⟨hal-00333290⟩
233 Consultations
130 Téléchargements

Partager

Gmail Facebook X LinkedIn More