Audio-Visual Speaker Localization via Weighted Clustering

In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

mainCameraReady-HAL.pdf (1.05 Mo)

result-mlsp.png (446.02 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Figure, Image

Perception team : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01053732

Soumis le : lundi 11 août 2014-16:26:58

Dernière modification le : jeudi 4 avril 2024-21:20:04

Archivage à long terme le : mardi 25 novembre 2014-22:51:05

Dates et versions

hal-01053732 , version 1 (11-08-2014)

Identifiants

HAL Id : hal-01053732 , version 1
DOI : 10.1109/MLSP.2014.6958874

Citer

Israel-Dejene Gebru, Xavier Alameda-Pineda, Radu Horaud, Florence Forbes. Audio-Visual Speaker Localization via Weighted Clustering. IEEE Workshop on Machine Learning for Signal Processing, Sep 2014, Reims, France. pp.1-6, ⟨10.1109/MLSP.2014.6958874⟩. ⟨hal-01053732⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_PS LJK_GI_PERCEPTION LJK_PS_MISTIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

538 Consultations

449 Téléchargements