Audio-Visual Speaker Localization via Weighted Clustering

Israel-Dejene Gebru 1, * Xavier Alameda-Pineda 1 Radu Horaud 1 Florence Forbes 2
* Auteur correspondant
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.
Type de document :
Communication dans un congrès
IEEE Workshop on Machine Learning for Signal Processing, Sep 2014, Reims, France. pp.1-6, 2014, <10.1109/MLSP.2014.6958874>
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01053732
Contributeur : Team Perception <>
Soumis le : lundi 11 août 2014 - 16:26:58
Dernière modification le : vendredi 18 septembre 2015 - 01:05:07
Document(s) archivé(s) le : mardi 25 novembre 2014 - 22:51:05

Fichiers

mainCameraReady-HAL.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Israel-Dejene Gebru, Xavier Alameda-Pineda, Radu Horaud, Florence Forbes. Audio-Visual Speaker Localization via Weighted Clustering. IEEE Workshop on Machine Learning for Signal Processing, Sep 2014, Reims, France. pp.1-6, 2014, <10.1109/MLSP.2014.6958874>. <hal-01053732>

Partager

Métriques

Consultations de
la notice

539

Téléchargements du document

355