Anchor and UBM-based Multi-Class MLLR M-Vector System for Speaker Verification

Abstract : In this paper, we propose two techniques to extend the recently introduced global Maximum Likelihood Linear Regression (MLLR) transformation (i.e. super-vector) based m-vector system for speaker verification into a multi-class MLLR m-vector system in the Universal Background Model (UBM) framework. In the first method, Gaussian mean vectors of the UBM are first grouped into several classes using conventional K-means and a proposed clustering algorithm based on Expectation Maximization (EM) and Maximum Likelihood (ML) concepts. Then, MLLR transformations are calculated for a given speech data with respect to each class, which are used in the form of super-vector for speaker representation by their m-vectors. In the second approach, several MLLR transformations are estimated with respect to pre-defined models called anchors. The proposed systems show better performance than the conventional system. Furthermore, the proposed UBM-based system does not require additional alignment of speech data with respect to the UBM for estimation of multiple MLLR transformations. We also further show that the proposed EM & ML clustering algorithm is robust to random initialization and provides equal or comparable system performance compared to K-means. The experimental results are shown on NIST 2008 SRE core condition over various tasks.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01690250
Contributor : Claude Barras <>
Submitted on : Tuesday, January 23, 2018 - 5:10:45 PM
Last modification on : Monday, September 16, 2019 - 11:45:59 AM
Long-term archiving on : Thursday, May 24, 2018 - 10:12:04 AM

File

i13_2450.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01690250, version 1

Collections

Citation

Achintya Sarkar, Claude Barras. Anchor and UBM-based Multi-Class MLLR M-Vector System for Speaker Verification. Interspeech 2013, Aug 2013, Lyon, France. ⟨hal-01690250⟩

Share

Metrics

Record views

21

Files downloads

28