Structured Prediction for Speaker Identification in TV Series - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Structured Prediction for Speaker Identification in TV Series

Résumé

Though radio and TV broadcast are highly structured documents, state-of-the-art speaker identification algorithms do not take advantage of this information to improve prediction performance: speech turns are usually identified independently from each other, using unstructured multi-class classification approaches. In this work, we propose to address speaker identification as a sequence labeling task and use two structured prediction techniques to account for the inherent temporal structure of interactions between speakers: the first one relies on Conditional Random Field and can take into account local relations between two consecutive speech turns; the second one, based on the S EARN framework, sacrifices exact inference for the sake of the expressiveness of the model and is able to incorporate rich structure information during prediction. Experiments performed on The Big Bang Theory TV series show that structured prediction techniques bring significant improvements over the standard unstructured approach.
Fichier principal
Vignette du fichier
knyazeva15_interspeech.pdf (310.31 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-01635001 , version 1 (31-08-2021)

Identifiants

  • HAL Id : hal-01635001 , version 1

Citer

Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon. Structured Prediction for Speaker Identification in TV Series. Annual Conference of the International Speech Communication Association, Jan 2015, Dresden, Germany. ⟨hal-01635001⟩
80 Consultations
60 Téléchargements

Partager

Gmail Facebook X LinkedIn More