Amélioration de la conversion de voix chuchotée enregistrée par capteur NAM vers la voix audible

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient. In this paper, we present our current work to improve the intelligibility and the naturalness of the synthesized speech converted from whispered speech with this technique. The first system is proposed to improve F0 estimation and voicing decision. A simple neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments. In the second system, we attempt to integrate visual information for improving both spectral estimation, F0 estimation and voicing decision.

Domains

Signal and Image processing Signal and Image Processing

Fichier principal

vat_JEP08.pdf (771.71 Ko)

Origin : Files produced by the author(s)

Gérard Bailly : Connect in order to contact the contributor

https://hal.science/hal-00339058

Submitted on : Saturday, November 15, 2008-5:53:40 PM

Last modification on : Thursday, April 4, 2024-9:36:21 PM

Long-term archiving on: Monday, June 7, 2010-8:40:11 PM

Dates and versions

hal-00339058 , version 1 (15-11-2008)

Identifiers

HAL Id : hal-00339058 , version 1

Cite

Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Christian Jutten. Amélioration de la conversion de voix chuchotée enregistrée par capteur NAM vers la voix audible. JEP 2008 - 27e Journées d'Etudes sur la Parole, Jun 2008, Avignon, France. pp.110-113. ⟨hal-00339058⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS OSUG GIPSA GIPSA-DIS GIPSA-DPC GIPSA-MPACIF GIPSA-PMD GIPSA-SIGMAPHY POLYTECH-GRENOBLE

297 View

243 Download