Skip to Main content Skip to Navigation
Journal articles

Multi-stage speaker diarization of broadcast news

Abstract : Abstract: This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01434241
Contributor : Sylvain Meignier <>
Submitted on : Wednesday, March 22, 2017 - 5:14:10 PM
Last modification on : Monday, February 10, 2020 - 6:14:06 PM
Document(s) archivé(s) le : Friday, June 23, 2017 - 12:33:50 PM

File

sap_rt_diarization.pdf
Files produced by the author(s)

Identifiers

Citation

Claude Barras, Xuan Zhu, Sylvain Meignier, Jean-Luc Gauvain. Multi-stage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2006, 14 (5), ⟨10.1109/TASL.2006.878261⟩. ⟨hal-01434241⟩

Share

Metrics

Record views

280

Files downloads

638