HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Similarity Metric Based on Siamese Neural Networks for Voice Casting

Abstract : Dubbing contributes to a larger international distribution of multi-media documents. It aims to replace the original voice in a source language by a new one in a target language. For now, the target voice selection procedure, called voice casting, is manually performed by human experts. This selection is not exclusively based on acoustic similarity between the two voices. Actually, it is also supported by more subjective criteria such as the "color" of the voice, socio-cultural choices... The objective of this work is to model a voice similarity metric able to embed all the concerned voice characteristics , including the observers' receptive interests. In this paper, we propose a Siamese Neural Networks-based approach, measuring proximity between the original and dubbed voices. We propose an adapted jackknifing cross-validation method to evaluate our similarity model on unseen voices. The results show that we successfully capture information allowing two voices to be associated, with respect to the character's or role's abstract dimension.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

Contributor : Adrien Gresse Connect in order to contact the contributor
Submitted on : Saturday, February 2, 2019 - 12:36:14 PM
Last modification on : Friday, November 12, 2021 - 11:18:05 AM




Adrien Gresse, Mathias Quillot, Richard Dufour, Vincent Labatut, Jean-François Bonastre. Similarity Metric Based on Siamese Neural Networks for Voice Casting. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019, Brighton, United Kingdom. pp.6585-6589, ⟨10.1109/ICASSP.2019.8683178⟩. ⟨hal-02004762⟩



Record views


Files downloads