On the use of voice descriptors for glottal source shape parameter estimation - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Computer Speech and Language Année : 2014

On the use of voice descriptors for glottal source shape parameter estimation

Résumé

This paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants-Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter Rd describes the transition in voice quality from a tense to a breathy voice. The parameter Rd has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the Rd parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter Rd using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous Rd estimates. Based on these findings we investigate into means to increase the robustness of the Rd parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated Rd parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to Rd it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the Rd regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches.
Fichier principal
Vignette du fichier
Huber 2013 - On the use of voice descriptors for glottal source shape parameter estimation - CSL in press.pdf (2.32 Mo) Télécharger le fichier
Origine : Accord explicite pour ce dépôt
Loading...

Dates et versions

hal-00865343 , version 1 (21-05-2019)

Identifiants

Citer

Stefan Huber, Axel Röbel. On the use of voice descriptors for glottal source shape parameter estimation. Computer Speech and Language, In press, 0885-2308, www.sciencedirect.com/science/article/pii/S0885230813000776. ⟨10.1016/j.csl.2013.09.006⟩. ⟨hal-00865343⟩
174 Consultations
104 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More