Skip to Main content Skip to Navigation
Conference papers

Barlow Twins self-supervised learning for robust speaker recognition

Mohammad Mohammadamini 1 Driss Matrouf 1 Jean-François A Bonastre 1 Sandipana Dowerah 2 Romain Serizel 2 Denis Jouvet 2 
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Acoustic noise is a big challenge for speaker recognition systems. The state-of-the-art speaker recognition systems are based on deep neural network speaker embeddings called xvector extractor. A noise-robust x-vector extractor is highly demanded in speaker recognition systems. In this paper, we introduce Barlow Twins self-supervised loss function in the area of speaker recognition. Barlow Twins objective function tries to optimize two criteria: Firstly, it increases the similarity between two versions of the same signal (i.e. the clean and its augmented noisy version) to make the speaker embedding invariant to the acoustic noise. Secondly, it reduces the redundancy between dimensions of the x-vectors that improves the overall quality of speaker embeddings. In our research, Barlow Twins objective function is integrated with the ResNet-based speaker embedding system. In the proposed system, the Barlow Twins objective function is calculated in the embedding layer and it is optimized jointly with the speaker classifier loss function. The experimental results on Fabiole corpus show 22 % relative gain in terms of EER in the clean environments and 18% improvement in the presence of noise with low SNR and reverberation.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03710445
Contributor : Mohammad Mohammadamini Connect in order to contact the contributor
Submitted on : Friday, July 1, 2022 - 11:20:51 AM
Last modification on : Saturday, July 2, 2022 - 3:46:37 AM

File

BT_SR_INTERSPEECH_Mohammadamin...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03710445, version 2

Citation

Mohammad Mohammadamini, Driss Matrouf, Jean-François A Bonastre, Sandipana Dowerah, Romain Serizel, et al.. Barlow Twins self-supervised learning for robust speaker recognition. Interspeech 2022 - Human and Humanizing Speech Technology, Sep 2022, Incheon, South Korea. ⟨hal-03710445v2⟩

Share

Metrics

Record views

43

Files downloads

5