Denoising x-vectors for Robust Speaker Recognition

Mohammad Mohammadamini; Driss Matrouf; Paul-Gauthier Noé

doi:10.21437/Odyssey.2020-11

Communication Dans Un Congrès Année : 2020

Denoising x-vectors for Robust Speaker Recognition

(1) , (1) , (1)

Mohammad Mohammadamini

Fonction : Auteur
PersonId : 1070002

Laboratoire Informatique d'Avignon

Driss Matrouf

Fonction : Auteur
PersonId : 176307
IdHAL : driss-matrouf
IdRef : 137773439

Laboratoire Informatique d'Avignon

Paul-Gauthier Noé

Fonction : Auteur
PersonId : 182438
IdHAL : paul-gauthier-noe
ORCID : 0000-0002-2304-9830

Laboratoire Informatique d'Avignon

Résumé

Using deep learning methods has led to significant improvement in speaker recognition systems. Introducing x-vectors as a speaker modeling method has made these systems more robust. Since, in challenging environments with noise and reverberation, the performance of x-vectors systems degrades significantly, the demand for denoising techniques remains as before. In this paper, for the first time, we try to denoise the x-vectors speaker embedding. Our focus is on additive noise. Firstly, we use the i-MAP method which considers that both noise and clean x-vectors have a Gaussian distribution. Then, leveraging denoising autoencoders (DAE) we try to reconstruct the clean x-vector from the corrupted version. After that, we propose two hybrid systems composed of statistical i-MAP and DAE. Finally, we propose a novel DAE architecture, named Deep Stacked DAE, composed of several DAEs where each DAE receives as input the output of its predecessor DAE concatenated with the difference between noisy x-vectors and its predecessor's output. The experiments on Fabiol corpus show that the results given by the hybrid DAE i-MAP method in several cases outperforms the conventional DAE and i-MAP methods. Also, the results for Deep Stacked DAE in most cases is better than the other proposed methods. For utterances longer than 12 seconds we achieved a 51% improvement in terms of EER with Deep Stacked DAE, and for utterances shorter than 2 seconds, Deep Stacked DAE gives 18% improvements compared to the baseline system.

Mots clés

Key terms: Speaker recognition x-vector i-MAP Noise compensation Denoising autoencoder

Domaines

Informatique [cs]

Fichier principal

78 (1).pdf (398.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mohammad Mohammadamini : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02614616

Soumis le : jeudi 21 mai 2020-12:00:11

Dernière modification le : lundi 23 mai 2022-03:26:02

Dates et versions

hal-02614616 , version 1 (21-05-2020)

Identifiants

HAL Id : hal-02614616 , version 1
DOI : 10.21437/Odyssey.2020-11

Citer

Mohammad Mohammadamini, Driss Matrouf, Paul-Gauthier Noé. Denoising x-vectors for Robust Speaker Recognition. Odyssey 2020 The Speaker and Language Recognition Workshop, Nov 2020, Tokyo, Japan. pp.75-80, ⟨10.21437/Odyssey.2020-11⟩. ⟨hal-02614616⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

632 Consultations

520 Téléchargements

Denoising x-vectors for Robust Speaker Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager