Two novel visual voice activity detectors based on appearance models and retinal filltering

Andrew Aubrey 1 Bertrand Rivet 2 Yulia Hicks 1 Laurent Girin 2 Jonathon Chambers 1 Christian Jutten 3
2 GIPSA-MPACIF - MPACIF
GIPSA-DPC - Département Parole et Cognition
3 GIPSA-SIGMAPHY - SIGMAPHY
GIPSA-DIS - Département Images et Signal
Abstract : In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit the imodality of speech (i.e. the coherence between speaker's lips and the resulting speech). The first method uses appearance parameters of a speaker's lips, obtained from an active appearance model (AAM). An HMM then dynamically models the change in appearance over time. The second method uses a retinal filter on the region of the lips to extract the required parameter. A corpus of a single speaker is applied to each method in turn, where each method is used to classify voice activity as speech or non speech. The efficiency of each method is evaluated individually using receiver operating characteristics and their respective performances are then compared and discussed. Both methods achieve a high correct ilence detection rate for a small false detection rate.
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00188132
Contributor : Bertrand Rivet <>
Submitted on : Thursday, November 15, 2007 - 5:10:12 PM
Last modification on : Tuesday, July 9, 2019 - 1:21:27 AM
Long-term archiving on : Monday, April 12, 2010 - 2:21:39 AM

File

RivetEUSIPCO07.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00188132, version 1

Citation

Andrew Aubrey, Bertrand Rivet, Yulia Hicks, Laurent Girin, Jonathon Chambers, et al.. Two novel visual voice activity detectors based on appearance models and retinal filltering. 15th European Signal Processing Conference (EUSIPCO-2007), Sep 2007, Poznan, Poland. ⟨hal-00188132⟩

Share

Metrics

Record views

704

Files downloads

277