Improving Speaker Diarization

Abstract : This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters when there is a large quantity of data for the speaker. In the RT-03S evaluation the baseline partitioner had a 24.5% diarization error rate. Several improvements to the baseline diarization system have been made. A standard Bayesian information criterion (BIC) agglomerative clustering has been integrated replacing the iterative Gaussian mixture model (GMM) clustering; a local BIC criterion is used for comparing single Gaussians with full covariance matrices. A second clustering stage has been added, making use of a speaker identification method: maximum a posteriori adaptation of a ref- erence GMM with 128 Gaussians. A final post-processing stage refines the segment boundaries using the output of the transcrip- tion system. Compared to the best configuration baseline system for this task, the improved system reduces the speaker error time by over 75% on the development data. On evaluation data, a 8.5% overall diarization error rate was obtained, a 60% reduction in error compared to the baseline.
Document type :
Other publications
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download
Contributor : Hakim Amokrane <>
Submitted on : Wednesday, March 22, 2017 - 11:26:49 PM
Last modification on : Saturday, May 4, 2019 - 1:20:29 AM
Long-term archiving on : Friday, June 23, 2017 - 2:33:47 PM


Files produced by the author(s)


  • HAL Id : hal-01451540, version 1



Claude Barras, Xuan Zhu, Sylvain Meignier, Jean-Luc Gauvain. Improving Speaker Diarization. 2004, pp.5. ⟨hal-01451540⟩



Record views


Files downloads