Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2007

Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

Résumé

Looking at the speaker's face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures.
Fichier principal
Vignette du fichier
Rivet-AVspeech-IEEE.pdf (743.9 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-00174100 , version 1 (21-09-2007)

Identifiants

Citer

Bertrand Rivet, Laurent Girin, Christian Jutten. Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15 (1), pp.96-108. ⟨10.1109/TASL.2006.872619⟩. ⟨hal-00174100⟩
229 Consultations
495 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More