Duration mismatch compensation using four-covariance model and deep neural network for speaker verification

Abstract : Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10 % relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap.
Document type :
Conference papers
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02159806
Contributor : Pierre-Michel Bousquet <>
Submitted on : Wednesday, June 19, 2019 - 9:08:19 AM
Last modification on : Friday, June 21, 2019 - 1:44:43 AM

File

LIA_pmb_interSp2017.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Pierre-Michel Bousquet, Mickael Rouvier. Duration mismatch compensation using four-covariance model and deep neural network for speaker verification. InterSpeech, 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-93⟩. ⟨hal-02159806⟩

Share

Metrics

Record views

4

Files downloads

10