Phone-Level Embeddings for Unit Selection Speech Synthesis

Deep neural networks have become the state of the art in speech synthesis. They have been used to directly predict signal parameters or provide unsupervised speech segment descriptions through embeddings. In this paper, we present four models with two of them enabling us to extract phone-level embeddings for unit selection speech synthesis. Three of the models rely on a feed-forward DNN, the last one on an LSTM. The resulting embeddings enable replacing usual expert-based target costs by an euclidean distance in the embedding space. This work is conducted on a French corpus of an 11 hours audiobook. Perceptual tests show the produced speech is preferred over a unit selection method where the target cost is defined by an expert. They also show that the embeddings are general enough to be used for different speech styles without quality loss. Furthermore, objective measures and a perceptual test on statistical parametric speech synthesis show that our models perform comparably to state-of-the-art models for parametric signal generation, in spite of necessary simplifications, namely late time integration and information compression.

Domaines

Intelligence artificielle [cs.AI] Informatique et langage [cs.CL]

Fichier principal

samplepaper.pdf (297.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Perquin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01840812

Soumis le : lundi 16 juillet 2018-17:05:05

Dernière modification le : mardi 3 octobre 2023-09:49:49

Archivage à long terme le : mercredi 17 octobre 2018-16:24:54

Dates et versions

hal-01840812 , version 1 (16-07-2018)

Identifiants

HAL Id : hal-01840812 , version 1
DOI : 10.1007/978-3-030-00810-9_3

Citer

Antoine Perquin, Gwénolé Lecorvé, Damien Lolive, Laurent Amsaleg. Phone-Level Embeddings for Unit Selection Speech Synthesis. SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Oct 2018, Mons, Belgium. pp.21-31, ⟨10.1007/978-3-030-00810-9_3⟩. ⟨hal-01840812⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA IRISA-INSA-R CENTRALESUPELEC IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

244 Consultations

469 Téléchargements