HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Phone-Level Embeddings for Unit Selection Speech Synthesis

Abstract : Deep neural networks have become the state of the art in speech synthesis. They have been used to directly predict signal parameters or provide unsupervised speech segment descriptions through embeddings. In this paper, we present four models with two of them enabling us to extract phone-level embeddings for unit selection speech synthesis. Three of the models rely on a feed-forward DNN, the last one on an LSTM. The resulting embeddings enable replacing usual expert-based target costs by an euclidean distance in the embedding space. This work is conducted on a French corpus of an 11 hours audiobook. Perceptual tests show the produced speech is preferred over a unit selection method where the target cost is defined by an expert. They also show that the embeddings are general enough to be used for different speech styles without quality loss. Furthermore, objective measures and a perceptual test on statistical parametric speech synthesis show that our models perform comparably to state-of-the-art models for parametric signal generation, in spite of necessary simplifications, namely late time integration and information compression.
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download

Contributor : Antoine Perquin Connect in order to contact the contributor
Submitted on : Monday, July 16, 2018 - 5:05:05 PM
Last modification on : Friday, April 8, 2022 - 4:08:03 PM
Long-term archiving on: : Wednesday, October 17, 2018 - 4:24:54 PM


Files produced by the author(s)



Antoine Perquin, Gwénolé Lecorvé, Damien Lolive, Laurent Amsaleg. Phone-Level Embeddings for Unit Selection Speech Synthesis. SLSP 2018 - 6th International Conference on Statistical Language and Speech Processing, Oct 2018, Mons, Belgium. pp.21-31, ⟨10.1007/978-3-030-00810-9_3⟩. ⟨hal-01840812⟩



Record views


Files downloads