Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast

We propose an approach for unsupervised speaker identification in TV broadcast videos, by combining acoustic speaker diarization with person names obtained via video OCR from overlaid texts. Three methods for the propagation of the overlaid names to the speech turns are compared, taking into account the co-occurence duration between the speaker clusters and the names provided by the video OCR and using a task-adapted variant of the TF-IDF information retrieval coefficient. These methods were tested on the REPERE dry-run evaluation corpus, containing 3 hours of annotated videos. Our best unsupervised system reaches a F-measure of 70.2% when considering all the speakers, and 81.7% if anchor speakers are left out. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.5% F-measure when considering all the speakers and 45.7% without anchor.

Mots clés

unsupervised speaker identification multimodal fusion speaker diarization optical character recognition reproducible results

Domaines

Recherche d'information [cs.IR]

Fichier principal

Poignant-al_Interspeech2012.pdf (274.42 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Georges Quénot : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00767427

Soumis le : mercredi 19 décembre 2012-18:08:36

Dernière modification le : vendredi 5 avril 2024-03:24:14

Archivage à long terme le : mercredi 20 mars 2013-11:37:19

Dates et versions

hal-00767427 , version 1 (19-12-2012)

Identifiants

HAL Id : hal-00767427 , version 1

Citer

Johann Poignant, Hervé Bredin, Viet-Bac Le, Laurent Besacier, Claude Barras, et al.. Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast. Interspeech 2012 - Conference of the International Speech Communication Association, Sep 2012, Portland, OR, United States. 4p. ⟨hal-00767427⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIMSI LIG_TDCGE LIG_TDCGE_GETALP LIG_TDCGE_MRIM SORBONNE-UNIVERSITE POLYTECH-GRENOBLE LISN GS-SPORT-HUMAN-MOVEMENT LIG_SIDCH

346 Consultations

197 Téléchargements