Visual voice activity detection as a help for speech source separation from convolutive mixtures - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2007

Visual voice activity detection as a help for speech source separation from convolutive mixtures

Résumé

Audio–visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is combined with a new geometric method of separation. The proposed audio–visual method is shown to be efficient to extract a real spontaneous speech utterance in the difficult case of convolutive mixtures even if the competing sources are highly non-stationary. Typical gains of 18–20 dB in signal to interference ratios are obtained for a wide range of (2 × 2) and (3 × 3) mixtures. Moreover, the overall process is computationally quite simpler than previously proposed audio–visual separation schemes.
Fichier principal
Vignette du fichier
PEER_stage2_10.1016%2Fj.specom.2007.04.008.pdf (1.86 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00499184 , version 1 (09-07-2010)

Identifiants

Citer

Bertrand Rivet, Laurent Girin, Christian Jutten. Visual voice activity detection as a help for speech source separation from convolutive mixtures. Speech Communication, 2007, 49 (7-8), pp.667-677. ⟨10.1016/j.specom.2007.04.008⟩. ⟨hal-00499184⟩
228 Consultations
233 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More