Are disentangled representations all you need to build speaker anonymization systems?

Pierre Champion; Denis Jouvet; Anthony Larcher

Communication Dans Un Congrès Année : 2022

Are disentangled representations all you need to build speaker anonymization systems?

(1, 2) , (1) , (2)

1
2

Pierre Champion

Fonction : Auteur
PersonId : 1110965

Speech Modeling for Facilitating Oral-Based Communication

Laboratoire d'Informatique de l'Université du Mans

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Speech Modeling for Facilitating Oral-Based Communication

Anthony Larcher

Fonction : Auteur
PersonId : 20105
IdHAL : anthony-larcher
ORCID : 0000-0003-4398-0224
IdRef : 139544569

Laboratoire d'Informatique de l'Université du Mans

Résumé

Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns when speech data get collected. Speaker anonymization aims to transform a speech signal to remove the source speaker's identity while leaving the spoken content unchanged. Current methods perform the transformation by relying on content/speaker disentanglement and voice conversion. Usually, an acoustic model from an automatic speech recognition system extracts the content representation while an x-vector system extracts the speaker representation. Prior work has shown that the extracted features are not perfectly disentangled. This paper tackles how to improve features disentanglement, and thus the converted anonymized speech. We propose enhancing the disentanglement by removing speaker information from the acoustic model using vector quantization. Evaluation done using the VoicePrivacy 2022 toolkit showed that vector quantization helps conceal the original speaker identity while maintaining utility for speech recognition.

Mots clés

Speaker Anonymization VoicePrivacy Challenge 2022 Vector Quantization Voice Conversion

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

main.pdf (518.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre CHAMPION : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03753746

Soumis le : mardi 6 décembre 2022-13:48:40

Dernière modification le : lundi 11 septembre 2023-17:41:19

Dates et versions

hal-03753746 , version 1 (19-08-2022)

hal-03753746 , version 2 (23-08-2022)

hal-03753746 , version 3 (06-12-2022)

Licence

Paternité

Identifiants

HAL Id : hal-03753746 , version 3
ARXIV : 2208.10497

Citer

Pierre Champion, Denis Jouvet, Anthony Larcher. Are disentangled representations all you need to build speaker anonymization systems?. INTERSPEECH 2022 - Human and Humanizing Speech Technology, Sep 2022, incheon, South Korea. ⟨hal-03753746v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LEMANS UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD LIUM LIUM-LST ANR

188 Consultations

251 Téléchargements

Are disentangled representations all you need to build speaker anonymization systems?

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager