What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate

Delphine Charlet; Johann Poignant; Hervé Bredin; Corinne Fredouille; Sylvain Meignier

Communication Dans Un Congrès Année : 2015

What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate

(1) , (2) , (2) , (3) , (4)

1
2
3
4

Delphine Charlet

Fonction : Auteur
PersonId : 1005321

Orange Labs [Lannion]

Johann Poignant

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Corinne Fredouille

Fonction : Auteur
PersonId : 173870
IdHAL : corinne-fredouille
ORCID : 0000-0002-0413-8950
IdRef : 079420516

Laboratoire Informatique d'Avignon

Sylvain Meignier

Fonction : Auteur
PersonId : 11674
IdHAL : sylvain-meignier
ORCID : 0000-0001-7687-073X
IdRef : 182269086

Laboratoire d'Informatique de l'Université du Mans

Résumé

Speaker identification approaches for TV broadcast are usually evaluated and compared based on global error rates derived from the overall duration of missed detection, false alarm and confusion. Based on the analysis of the output of the systems submitted to the final round of the French evaluation campaign REPERE, this paper highlights the fact that these average met-rics lead to the incorrect intuition that current state-of-the-art algorithms partially recognize all speakers. Setting aside incorrect diarization and adverse acoustic conditions, we show that their performance is in fact essentially bi-modal: in a given show, either all speech turns of a speaker are correctly identified or none of them are. We then proceed with trying to understand and explain this behavior, through perfomance prediction experiments. These experiments show that the most discriminant speaker characteristics are – first – their total speech duration in the current show and – then only – the amount of training data available to build their acoustic model.

Mots clés

speaker recognition error analysis TV broadcast

Domaines

Informatique et langage [cs.CL]

Fichier principal

Charlet2015.pdf (519.81 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

sylvain meignier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01433205

Soumis le : jeudi 6 avril 2017-08:59:44

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : vendredi 7 juillet 2017-12:30:14

Dates et versions

hal-01433205 , version 1 (06-04-2017)

Identifiants

HAL Id : hal-01433205 , version 1

Citer

Delphine Charlet, Johann Poignant, Hervé Bredin, Corinne Fredouille, Sylvain Meignier. What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate. ERRARE Workshop, a satellite event of Interspeech 2015., 2015, Sinaia, Romania. ⟨hal-01433205⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON CNRS UNIV-LEMANS LIMSI LIUM LIUM-LST UNIV-PARIS-SACLAY LIA SORBONNE-UNIVERSITE ANR LISN GS-COMPUTER-SCIENCE

221 Consultations

72 Téléchargements

What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager