TTS voice corpus reduction for audio-book generation

Meysam Shamsi

Communication Dans Un Congrès Année : 2020

TTS voice corpus reduction for audio-book generation

(1)

Meysam Shamsi

Fonction : Auteur
PersonId : 750650
IdHAL : meysam-shamsi
ORCID : 0000-0002-4104-9826

Institut de Recherche en Informatique et Systèmes Aléatoires

Résumé

Nowadays, with emerging new voice corpora, voice corpus reduction in expressive TTS becomes more important. In this study a spitting greedy approach is investigated to remove utterances. In the first step by comparing five objective measures, the TTS global cost has been found as the best available metric for approximation of perceptual quality. The greedy algorithm employs this measure to evaluate the candidates in each step and the synthetic quality resulted by its solution. It turned out that reducing voice corpus size until a certain length (1 hour in our experiment) could not degrade the synthetic quality. By modifying the original greedy algorithm, its computation time is reduced to a reasonable duration. Two perceptual tests have been run to compare this greedy method and the random strategy for voice corpus reduction. They revealed that there is no superiority of using the proposed greedy approach for corpus reduction.

Mots clés

Text-to-speech voice corpus greedy algorithm perceptual test. M OTS - CLÉS : Synthèse vocale corpus vocal algorithme glouton test de perception.

Domaines

Informatique et langage [cs.CL]

Fichier principal

186.pdf (509.09 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Sylvain Pogodalla : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02786200

Soumis le : dimanche 7 juin 2020-20:39:45

Dernière modification le : vendredi 24 mars 2023-14:53:17

Dates et versions

hal-02786200 , version 1 (07-06-2020)

hal-02786200 , version 2 (17-06-2020)

hal-02786200 , version 3 (23-06-2020)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-02786200 , version 1

Citer

Meysam Shamsi. TTS voice corpus reduction for audio-book generation. 6e conférence conjointe Journées d'Études sur la Parole (JEP, 31e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition), 2020, Nancy, France. pp.193-204. ⟨hal-02786200v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

177 Consultations

112 Téléchargements

TTS voice corpus reduction for audio-book generation

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Partager