Speaker diarization : A review of recent research

Abstract : Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.
Type de document :
Article dans une revue
IEEE transactions on acoustics, speech, and signal processing, Institute of Electrical and Electronics Engineers (IEEE), 2010, pp.1
Liste complète des métadonnées

Littérature citée [124 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00733397
Contributeur : Simon Bozonnet <>
Soumis le : mardi 18 septembre 2012 - 15:37:02
Dernière modification le : jeudi 14 juin 2018 - 11:46:01
Document(s) archivé(s) le : mercredi 19 décembre 2012 - 03:45:39

Fichier

IEEE_Transaction2010_Speaker_D...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00733397, version 1

Collections

Citation

Xavier Anguera, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, et al.. Speaker diarization : A review of recent research. IEEE transactions on acoustics, speech, and signal processing, Institute of Electrical and Electronics Engineers (IEEE), 2010, pp.1. 〈hal-00733397〉

Partager

Métriques

Consultations de la notice

249

Téléchargements de fichiers

1510