Skip to Main content Skip to Navigation
New interface
Conference papers

Influence of speaker pre-training on character voice representation

Abstract : Finding professional voice-actors for cultural productions is performed by a human operator and suffers from several difficulties. Researchers have therefore been interested for several years in mimicking the process of vocal casting to help human operators find new voices. However, voice casting appears to be an underdefined task with many difficulties. The main issue is that no label is available to accurately assess the performance of voice casting systems. To tackle these problems, recent works have focused on building a speech representation of acted voices able to highlight the character dimension. The proposed approach relies on an initial sequence extractor issued from a speaker recognition system which is able to represent a time variable speech sequence by a unique fixed-size vector, followed by a dedicated neural network where the character-based embedding, called p-vector, is extracted. It is legitimate to wonder if the sequence extractor is not guiding p-vectors too much towards speaker information. We then propose to study the impact of the speaker pre-training on the character representation learning. In comparison to a directly trained character representation, the results show that the use of a speaker pre-training provides more character information while retaining the speaker-independent part.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03348578
Contributor : Mathias Quillot Connect in order to contact the contributor
Submitted on : Sunday, September 19, 2021 - 2:29:21 PM
Last modification on : Thursday, December 1, 2022 - 11:26:04 AM
Long-term archiving on: : Tuesday, December 21, 2021 - 9:08:00 AM

File

_SPECOM_2021__Influence_of_Spe...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03348578, version 1

Citation

Mathias Quillot, Jarod Duret, Richard Dufour, Mickael Rouvier, Jean-François Bonastre. Influence of speaker pre-training on character voice representation. 23rd International Conference on Speech and Computer (SPECOM), Sep 2021, Saint Petersburg, Russia. ⟨hal-03348578⟩

Share

Metrics

Record views

32

Files downloads

58