Multi-stage speaker diarization of broadcast news

Claude Barras; Xuan Zhu; Sylvain Meignier; Jean-Luc Gauvain

doi:10.1109/TASL.2006.878261

Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2006

Multi-stage speaker diarization of broadcast news

(1) , (1) , (2, 1) , (1)

1
2

Claude Barras

Fonction : Auteur
PersonId : 17217
IdHAL : claude-barras
IdRef : 165065583

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Xuan Zhu

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Sylvain Meignier

Fonction : Auteur
PersonId : 11674
IdHAL : sylvain-meignier
ORCID : 0000-0001-7687-073X
IdRef : 182269086

Laboratoire d'Informatique de l'Université du Mans

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Jean-Luc Gauvain

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

Abstract: This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system

Mots clés

Bayesian information criterion (BIC) clustering speaker diarization speaker identification (SID) speaker segmentation and clustering

Domaines

Informatique et langage [cs.CL]

Fichier principal

sap_rt_diarization.pdf (315.51 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

sylvain meignier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01434241

Soumis le : mercredi 22 mars 2017-17:14:10

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : vendredi 23 juin 2017-12:33:50

Dates et versions

hal-01434241 , version 1 (22-03-2017)

Identifiants

HAL Id : hal-01434241 , version 1
DOI : 10.1109/TASL.2006.878261

Citer

Claude Barras, Xuan Zhu, Sylvain Meignier, Jean-Luc Gauvain. Multi-stage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14 (5), ⟨10.1109/TASL.2006.878261⟩. ⟨hal-01434241⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LEMANS LIMSI LIUM LIUM-LST UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN

208 Consultations

623 Téléchargements

Multi-stage speaker diarization of broadcast news

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager