Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

Israel Dejene Gebru 1 Silèye Ba 1 Georgios Evangelidis 1 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : Any multi-party conversation system benefits from speaker diarization, that is, the assignment of speech signals among the participants. We here cast the diarization problem into a tracking formulation whereby the active speaker is detected and tracked over time. A probabilistic tracker exploits the on-image (spatial) coincidence of visual and auditory observations and infers a single latent variable which represents the identity of the active speaker. Both visual and auditory observations are explained by a recently proposed weighted-data mixture model, while several options for the speaking turns dynamics are fulfilled by a multi-case transition model. The modules that translate raw audio and visual data into on-image observations are also described in detail. The performance of the proposed tracker is tested on challenging data-sets that are available from recent contributions which are used as baselines for comparison.
Type de document :
Communication dans un congrès
ICCV Workshop on 3D Reconstruction and Understanding with Video and Sound , Dec 2015, Santiago, Chile. pp.702 - 708, 〈10.1109/ICCVW.2015.96〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01220956
Contributeur : Israel Dejene Gebru <>
Soumis le : mardi 27 octobre 2015 - 10:36:51
Dernière modification le : mercredi 11 avril 2018 - 01:58:24
Document(s) archivé(s) le : vendredi 28 avril 2017 - 06:34:13

Fichier

main_R1.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Israel Dejene Gebru, Silèye Ba, Georgios Evangelidis, Radu Horaud. Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model. ICCV Workshop on 3D Reconstruction and Understanding with Video and Sound , Dec 2015, Santiago, Chile. pp.702 - 708, 〈10.1109/ICCVW.2015.96〉. 〈hal-01220956〉

Partager

Métriques

Consultations de la notice

552

Téléchargements de fichiers

355