Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification

Abstract : Most speaker recognition systems rely on short-term acoustic cepstral features for extracting the speaker-relevant information from the signal. But phonetic discriminant features, extracted by a bottleneck multi-layer perceptron (MLP) on longer stretches of time, can provide a complementary information and have been adopted in speech transcription systems. We compare the speaker verification performance using cepstral features, discriminant features, and a concatenation of both followed by a dimension reduction. We consider two speaker recognition systems, one based on maximum likelihood linear regression (MLLR) super-vectors and the other on a state-of-the-art i-vector system with two session variability compensation schemes. Experiments are reported on a standard configuration of NIST SRE 2008 and 2010 databases. The results show that the phonetically discriminative MLP features retain speaker-specific information which is complementary to the short-term cepstral features. The performance improvement is obtained with both score domain and feature domain fusion and the speaker verification equal error rate (EER) is reduced up to 50% relative, compared to the best i-vector system using only cepstral features.
Document type :
Journal articles
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download
Contributor : Claude Barras <>
Submitted on : Monday, January 22, 2018 - 10:41:48 PM
Last modification on : Monday, September 16, 2019 - 11:45:47 AM
Long-term archiving on : Thursday, May 24, 2018 - 10:45:28 AM


Files produced by the author(s)




Achintya Sarkar, Cong-Thanh Do, Viet-Bac Le, Claude Barras. Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification. IEEE Signal Processing Letters, Institute of Electrical and Electronics Engineers, 2014, 21 (9), pp.1040 - 1044. ⟨10.1109/LSP.2014.2323432⟩. ⟨hal-01690336⟩



Record views


Files downloads