Similarity Metric Based on Siamese Neural Networks for Voice Casting

Abstract : Dubbing contributes to a larger international distribution of multi-media documents. It aims to replace the original voice in a source language by a new one in a target language. For now, the target voice selection procedure, called voice casting, is manually performed by human experts. This selection is not exclusively based on acoustic similarity between the two voices. Actually, it is also supported by more subjective criteria such as the "color" of the voice, socio-cultural choices... The objective of this work is to model a voice similarity metric able to embed all the concerned voice characteristics , including the observers' receptive interests. In this paper, we propose a Siamese Neural Networks-based approach, measuring proximity between the original and dubbed voices. We propose an adapted jackknifing cross-validation method to evaluate our similarity model on unseen voices. The results show that we successfully capture information allowing two voices to be associated, with respect to the character's or role's abstract dimension.
Document type :
Conference papers
Complete list of metadatas
Contributor : Adrien Gresse <>
Submitted on : Saturday, February 2, 2019 - 12:36:14 PM
Last modification on : Monday, April 29, 2019 - 9:56:48 AM
Long-term archiving on : Friday, May 3, 2019 - 4:03:39 PM


Files produced by the author(s)




Adrien Gresse, Mathias Quillot, Richard Dufour, Vincent Labatut, Jean-François Bonastre. Similarity Metric Based on Siamese Neural Networks for Voice Casting. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019, Brighton, United Kingdom. pp.6585-6589, ⟨10.1109/ICASSP.2019.8683178⟩. ⟨hal-02004762⟩



Record views


Files downloads