Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting

Adrien Gresse; Mathias Quillot; Richard Dufour; Jean-François Bonastre

Communication Dans Un Congrès Année : 2020

Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting

(1) , (1) , (1) , (1)

Adrien Gresse

Fonction : Auteur
PersonId : 172309
IdHAL : adrien-gresse

Laboratoire Informatique d'Avignon

Mathias Quillot

Fonction : Auteur

Laboratoire Informatique d'Avignon

Richard Dufour

Fonction : Auteur
PersonId : 178348
IdHAL : richard-dufour
ORCID : 0000-0003-1203-9108

Laboratoire Informatique d'Avignon

Jean-François Bonastre

Fonction : Auteur

Laboratoire Informatique d'Avignon

Résumé

The search for professional voice-actors for audiovisual productions is a sensitive task, performed by the artistic directors (ADs). The ADs have a strong appetite for new talents/voices but cannot perform large scale auditions. Automatic tools able to suggest the most suited voices are of a great interest for audiovisual industry. In previous works, we showed the existence of acoustic information allowing to mimic the AD's choices. However, the only available information is the ADs' choices from the already dubbed multimedia productions. In this paper, we propose a representation-learning based strategy to build a character/role representation, called p-vector. In addition, the large variability between audiovisual productions makes difficult to have homogeneous training datasets. We overcome this difficulty by using knowledge distillation methods to take advantage of external datasets. Experiments are conducted on video-game voice excerpts. Results show a significant improvement using the p-vector, compared to the speaker-based x-vectors representation.

Domaines

Machine Learning [stat.ML]

Fichier principal

Learning_Voice_Representation_Using_Knowledge_Distillation_For_Automatic_Voice_Casting.pdf (229.83 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Adrien Gresse : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02572383

Soumis le : mercredi 13 mai 2020-16:09:59

Dernière modification le : vendredi 12 novembre 2021-11:50:56

Dates et versions

hal-02572383 , version 1 (13-05-2020)

Identifiants

HAL Id : hal-02572383 , version 1

Citer

Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre. Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting. Interspeech, Oct 2020, Shanghai, China. ⟨hal-02572383⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

164 Consultations

332 Téléchargements

Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager