Skip to Main content Skip to Navigation
Conference papers

Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Raphaël Duroselle 1 Denis Jouvet 1 Irina Illina 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : State-of-the-art language recognition systems are based on dis-criminative embeddings called x-vectors. Channel and gender distortions produce mismatch in such x-vector space where em-beddings corresponding to the same language are not grouped in an unique cluster. To control this mismatch, we propose to train the x-vector DNN with metric learning objective functions. Combining a classification loss with the metric learning n-pair loss allows to improve the language recognition performance. Such a system achieves a robustness comparable to a system trained with a domain adaptation loss function but without using the domain information. We also analyze the mismatch due to channel and gender, in comparison to language proximity, in the x-vector space. This is achieved using the Maximum Mean Discrepancy divergence measure between groups of x-vectors. Our analysis shows that using the metric learning loss function reduces gender and channel mismatch in the x-vector space, even for languages only observed on one channel in the train set.
Document type :
Conference papers
Complete list of metadata

Cited literature [32 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02920460
Contributor : Raphaël Duroselle <>
Submitted on : Monday, August 24, 2020 - 4:39:08 PM
Last modification on : Monday, February 15, 2021 - 1:48:13 PM
Long-term archiving on: : Tuesday, December 1, 2020 - 8:37:46 PM

File

raphael_interspeech_v9.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02920460, version 1

Citation

Raphaël Duroselle, Denis Jouvet, Irina Illina. Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition. INTERSPEECH 2020, Oct 2020, Shangaï / Virtual, China. ⟨hal-02920460⟩

Share

Metrics

Record views

129

Files downloads

91