Structured Prediction for Speaker Identification in TV Series

Though radio and TV broadcast are highly structured documents, state-of-the-art speaker identification algorithms do not take advantage of this information to improve prediction performance: speech turns are usually identified independently from each other, using unstructured multi-class classification approaches. In this work, we propose to address speaker identification as a sequence labeling task and use two structured prediction techniques to account for the inherent temporal structure of interactions between speakers: the first one relies on Conditional Random Field and can take into account local relations between two consecutive speech turns; the second one, based on the S EARN framework, sacrifices exact inference for the sake of the expressiveness of the model and is able to incorporate rich structure information during prediction. Experiments performed on The Big Bang Theory TV series show that structured prediction techniques bring significant improvements over the standard unstructured approach.

Mots clés

speaker identification speaker diarization sequence labeling structured prediction

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Fichier principal

knyazeva15_interspeech.pdf (310.31 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01635001

Soumis le : mardi 31 août 2021-09:00:48

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : mercredi 1 décembre 2021-18:58:44

Dates et versions

hal-01635001 , version 1 (31-08-2021)

Identifiants

HAL Id : hal-01635001 , version 1

Citer

Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon. Structured Prediction for Speaker Identification in TV Series. Annual Conference of the International Speech Communication Association, Jan 2015, Dresden, Germany. ⟨hal-01635001⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

80 Consultations

60 Téléchargements