Skip to Main content Skip to Navigation
Journal articles

Step-by-step and integrated approaches in broadcast news speaker diarization

Abstract : This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RTÕ03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes. Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results. Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diariza-tion process. Along with enriching the resulting segmentation with information about speaker gender,
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01318554
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Friday, March 24, 2017 - 11:29:50 PM
Last modification on : Thursday, February 27, 2020 - 10:44:03 AM
Document(s) archivé(s) le : Sunday, June 25, 2017 - 12:32:41 PM

File

lia-clips.pdf
Files produced by the author(s)

Identifiers

Citation

Sylvain Meignier, Daniel Moraru, Corinne Fredouille, Jean-François Bonastre, Laurent Besacier. Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, Elsevier, 2006, Odyssey 2004: The speaker and Language Recognition Workshop Odyssey-04, Odyssey 2004: The speaker and Language Recognition Workshop, 20 (2-3), pp.303-330. ⟨10.1016/j.csl.2005.08.002⟩. ⟨hal-01318554⟩

Share

Metrics

Record views

656

Files downloads

465