Developing an audio-visual speech source separation algorithm

Looking at the speaker's face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audiovisual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the “best” sensor with reference to a target source.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

Sodoyer_Speech_Comm_2004.pdf (600.64 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Luc Schwartz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00186591

Soumis le : vendredi 9 novembre 2007-18:31:28

Dernière modification le : jeudi 4 avril 2024-21:36:56

Archivage à long terme le : lundi 12 avril 2010-01:47:17

Dates et versions

hal-00186591 , version 1 (09-11-2007)

Identifiants

HAL Id : hal-00186591 , version 1

Citer

David Sodoyer, Laurent Girin, Christian Jutten, Jean-Luc Schwartz. Developing an audio-visual speech source separation algorithm. Speech Communication, 2004, 44, pp.113-125. ⟨hal-00186591⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS ICP POLYTECH-GRENOBLE

329 Consultations

140 Téléchargements