Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

Israel Dejene Gebru; Silèye Ba; Georgios Evangelidis; Radu Horaud

doi:10.1109/ICCVW.2015.96

Communication Dans Un Congrès Année : 2015

Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

(1) , (1) , (1) , (1)

Israel Dejene Gebru

Fonction : Auteur
PersonId : 959161

Interpretation and Modelling of Images and Videos

Silèye Ba

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Georgios Evangelidis

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

Any multi-party conversation system benefits from speaker diarization, that is, the assignment of speech signals among the participants. We here cast the diarization problem into a tracking formulation whereby the active speaker is detected and tracked over time. A probabilistic tracker exploits the on-image (spatial) coincidence of visual and auditory observations and infers a single latent variable which represents the identity of the active speaker. Both visual and auditory observations are explained by a recently proposed weighted-data mixture model, while several options for the speaking turns dynamics are fulfilled by a multi-case transition model. The modules that translate raw audio and visual data into on-image observations are also described in detail. The performance of the proposed tracker is tested on challenging data-sets that are available from recent contributions which are used as baselines for comparison.

Mots clés

Speaker diarization Audio-visual fusion Sound-source localization Multi-person tracking Temporal graphical models

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Multimédia [cs.MM]

Fichier principal

main_R1.pdf (1.81 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Israel Dejene Gebru : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01220956

Soumis le : mardi 27 octobre 2015-10:36:51

Dernière modification le : samedi 27 avril 2024-03:09:46

Archivage à long terme le : vendredi 28 avril 2017-06:34:13

Dates et versions

hal-01220956 , version 1 (27-10-2015)

Identifiants

HAL Id : hal-01220956 , version 1
DOI : 10.1109/ICCVW.2015.96

Citer

Israel Dejene Gebru, Silèye Ba, Georgios Evangelidis, Radu Horaud. Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model. ICCV Workshop on 3D Reconstruction and Understanding with Video and Sound , Dec 2015, Santiago, Chile. pp.702 - 708, ⟨10.1109/ICCVW.2015.96⟩. ⟨hal-01220956⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA INSMI LJK LJK_GI LJK_GI_PERCEPTION INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

568 Consultations

334 Téléchargements

Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager