On the suitability of vocalic sandwiches in a corpus-based TTS engine

David Guennec; Damien Lolive

Communication Dans Un Congrès Année : 2016

On the suitability of vocalic sandwiches in a corpus-based TTS engine

(1) , (1)

David Guennec

Fonction : Auteur
PersonId : 955117
IdHAL : 197707475
ORCID : 0009-0006-3265-6321

Expressiveness in Human Centered Data/Media

Damien Lolive

Fonction : Auteur
PersonId : 5088
IdHAL : damien-lolive
ORCID : 0000-0002-1110-5444
IdRef : 13017498X

Expressiveness in Human Centered Data/Media

Résumé

Unit selection speech synthesis systems generally rely on target and concatenation costs for selecting the best unit sequence. The role of the concatenation cost is to insure that joining two voice segments will not cause any acoustic artefact to appear. For this task, acoustic distances (MFCC, F0) are typically used but in many cases, this is not enough to prevent concatenation artefacts. Among other strategies, the improvement of corpus covering by favouring units that naturally support well the joining process (vocalic sandwiches) seems to be effective on TTS. In this paper, we investigate if vocalic sandwiches can be used directly in the unit selection engine when the corpus was not created using that principle. First, the sandwich approach is directly transposed in the unit selection engine with a penalty that greatly favours concatenation on sandwich boundaries. Second, a derived fuzzy version is proposed to relax the penalty based on the concatenation cost, with respect to the cost distribution. We show that the sandwich approach, very efficient at the corpus creation step, seems to be inefficient when directly transposed in the unit selection engine. However, we observe that the fuzzy approach enhances synthesis quality, especially on sentences with high concatenation costs.

Mots clés

concatenation cost corpus-based TTS unit selection

Domaines

Intelligence artificielle [cs.AI] Traitement du signal et de l'image [eess.SP] Son [cs.SD] Interface homme-machine [cs.HC]

Damien Lolive : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01338839

Soumis le : mercredi 29 juin 2016-11:43:37

Dernière modification le : mardi 3 octobre 2023-09:49:21

Dates et versions

hal-01338839 , version 1 (29-06-2016)

Identifiants

HAL Id : hal-01338839 , version 1

Citer

David Guennec, Damien Lolive. On the suitability of vocalic sandwiches in a corpus-based TTS engine. Interspeech, Sep 2016, San Francisco, United States. ⟨hal-01338839⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA CENTRALESUPELEC IRISA-D6 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

621 Consultations

0 Téléchargements

On the suitability of vocalic sandwiches in a corpus-based TTS engine

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager