Skip to Main content Skip to Navigation
Conference papers

Similarity Metric Based on Siamese Neural Networks for Voice Casting

Abstract : Dubbing contributes to a larger international distribution of multi-media documents. It aims to replace the original voice in a source language by a new one in a target language. For now, the target voice selection procedure, called voice casting, is manually performed by human experts. This selection is not exclusively based on acoustic similarity between the two voices. Actually, it is also supported by more subjective criteria such as the "color" of the voice, socio-cultural choices... The objective of this work is to model a voice similarity metric able to embed all the concerned voice characteristics , including the observers' receptive interests. In this paper, we propose a Siamese Neural Networks-based approach, measuring proximity between the original and dubbed voices. We propose an adapted jackknifing cross-validation method to evaluate our similarity model on unseen voices. The results show that we successfully capture information allowing two voices to be associated, with respect to the character's or role's abstract dimension.
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02004762
Contributor : Adrien Gresse <>
Submitted on : Saturday, February 2, 2019 - 12:36:14 PM
Last modification on : Tuesday, January 14, 2020 - 10:38:06 AM

Identifiers

Collections

Citation

Adrien Gresse, Mathias Quillot, Richard Dufour, Vincent Labatut, Jean-François Bonastre. Similarity Metric Based on Siamese Neural Networks for Voice Casting. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2019, Brighton, United Kingdom. pp.6585-6589, ⟨10.1109/ICASSP.2019.8683178⟩. ⟨hal-02004762⟩

Share

Metrics

Record views

136

Files downloads

387