What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate

Abstract : Speaker identification approaches for TV broadcast are usually evaluated and compared based on global error rates derived from the overall duration of missed detection, false alarm and confusion. Based on the analysis of the output of the systems submitted to the final round of the French evaluation campaign REPERE, this paper highlights the fact that these average met-rics lead to the incorrect intuition that current state-of-the-art algorithms partially recognize all speakers. Setting aside incorrect diarization and adverse acoustic conditions, we show that their performance is in fact essentially bi-modal: in a given show, either all speech turns of a speaker are correctly identified or none of them are. We then proceed with trying to understand and explain this behavior, through perfomance prediction experiments. These experiments show that the most discriminant speaker characteristics are – first – their total speech duration in the current show and – then only – the amount of training data available to build their acoustic model.
Type de document :
Communication dans un congrès
ERRARE Workshop, a satellite event of Interspeech 2015., 2015, Sinaia, Romania. Interspeech 2015, 2015
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01433205
Contributeur : Sylvain Meignier <>
Soumis le : jeudi 6 avril 2017 - 08:59:44
Dernière modification le : lundi 18 mars 2019 - 16:22:58
Document(s) archivé(s) le : vendredi 7 juillet 2017 - 12:30:14

Fichier

Charlet2015.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01433205, version 1

Citation

Delphine Charlet, Johann Poignant, Hervé Bredin, Corinne Fredouille, Sylvain Meignier. What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate. ERRARE Workshop, a satellite event of Interspeech 2015., 2015, Sinaia, Romania. Interspeech 2015, 2015. 〈hal-01433205〉

Partager

Métriques

Consultations de la notice

222

Téléchargements de fichiers

41