F. Abrard and Y. Deville, Blind separation of dependent sources using the " timefrequency ratio of mixture " approach, Proc. International Symposium on Signal Processing and its Applications (ISSPA), pp.81-84, 2003.

C. Abry and L. J. Boë, ???Laws??? for lips, Speech Communication, vol.5, issue.1, pp.97-104, 1986.
DOI : 10.1016/0167-6393(86)90032-4

A. Aubrey, B. Rivet, Y. Hicks, L. Girin, J. Chambers et al., Comparison of appearance models and retinal filtering for visual voice activity detection, Proc. European Signal Processing Conference (EUSIPCO), 2007.

G. Bailly and P. Badin, Seeing tongue movements from outside, Proc. International Conference Spoken Language Processing (ICSLP), pp.1913-1916, 2002.

G. Bailly, M. Berard, F. Elisei, and M. Odisio, Audiovisual speech synthesis, International Journal of Speech Technology, vol.6, issue.4, pp.331-346, 2003.
DOI : 10.1023/A:1025700715107

URL : https://hal.archives-ouvertes.fr/hal-00169556

J. P. Barker and F. Berthommier, Estimation of speech acoustics from visual speech features: a compararison of linear and non-linear models, Proc. Audio-Visual Speech Processing (AVSP), pp.112-117, 1999.

C. Benoit, T. Lallouache, T. Mohamadi, and C. Abry, A Set of French Visemes for Visual Speech Synthesis, Talking machines:Theories, Models, and Designs, pp.485-504, 1992.

C. Benoît, T. Mohamadi, and S. Kandel, Effects of Phonetic Context on Audio-Visual Intelligibility of French, Journal of Speech Language and Hearing Research, vol.37, issue.5, pp.1195-1293, 1994.
DOI : 10.1044/jshr.3705.1195

C. Benoît, T. Guiard-marigny, L. Goff, B. Adjoudani, and A. , Which components of the face humans and machines best speechread ? " in Speechreading by man and machine: Models, Systems and Applications edited by, NATO ASI Series), pp.315-328, 1996.

P. Bertelson, Ventriloquism: a case of crossmodal perceptual grouping, " in Cognitive Contributions to the Perception of Spatial and Temporal Events, pp.347-362, 1999.

G. A. Calvert and R. Campbell, Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech, Journal of Cognitive Neuroscience, vol.21, issue.1, pp.57-70, 2003.
DOI : 10.1016/S0167-6393(98)00048-X

N. Campbell, Approaches to conversational speech rhythm: speech activity in twoperson telephone dialogues, Proc. International Congres of Phonetic Sciences (ICPhS), pp.343-348, 2007.

P. Cosi, A. Fusaro, and G. Tisato, LUCIA: a new Italian talking-head based on a modified Cohen-Massaro's labial coarticulation model, Proc. European Conference on Speech Communication and Technology (EuroSpeech), pp.2269-2272, 2003.

D. Cueto, P. Neti, and C. A. Senior, Audio-visual intent-to-speak detection in human-computer interaction, Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.2373-2376, 2000.

S. Deligne, G. Potamianos, and C. Neti, Audio-visual speech enhancement with AVCDCN (AudioVisual Codebook Dependent Cepstral Normalization), Proc. International Conference Spoken Language Proc. (ICSLP), pp.1449-1452, 2002.
DOI : 10.1109/sam.2002.1191001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.5105

Y. Ephraim and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.32, issue.6, pp.1109-1121, 1984.
DOI : 10.1109/TASSP.1984.1164453

N. P. Erber, Auditory-Visual Perception of Speech, Journal of Speech and Hearing Disorders, vol.40, issue.4, pp.481-492, 1975.
DOI : 10.1044/jshd.4004.481

G. Gibert, G. Bailly, D. Beautemps, F. Elisei, and R. Brun, Analysis and synthesis of the three-dimensional movements of the head, face, and hand of a speaker using cued speech, The Journal of the Acoustical Society of America, vol.118, issue.2, pp.1144-1153, 2005.
DOI : 10.1121/1.1944587

L. Girin, J. Schwartz, and G. Feng, Audio-visual enhancement of speech in noise, The Journal of the Acoustical Society of America, vol.109, issue.6, pp.3007-3020, 2001.
DOI : 10.1121/1.1358887

L. Girin, Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding, IEEE Transactions on Speech and Audio Processing, vol.12, issue.3, pp.265-276, 2004.
DOI : 10.1109/TSA.2003.822626

R. Goecke and J. B. Millar, Statistical analysis of relationship between audio and video speech parameters australian English, Proc. Audio-Visual Speech Processing. (AVSP), pp.133-138, 2003.

K. W. Grant and P. Seitz, The use of visible speech cues for improving auditory detection of spoken sentences, The Journal of the Acoustical Society of America, vol.108, issue.3, pp.1197-1208, 2000.
DOI : 10.1121/1.1288668

J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. Wong, Integration of multimodal features for video scene classification based on HMM, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451), pp.53-58, 1999.
DOI : 10.1109/MMSP.1999.793797

G. Iyengar and N. C. , A vision-based microphone switch for speech intent detection, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pp.101-105, 2001.
DOI : 10.1109/RATFG.2001.938917

J. Jiang, A. Alwan, P. A. Keating, E. T. Auer, and L. E. Bernstein, On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.1174-1188, 2002.
DOI : 10.1155/S1110865702206046

J. Kim, D. , and C. , Investigating the audio???visual speech detection advantage, Speech Communication, vol.44, issue.1-4, pp.1-4, 2004.
DOI : 10.1016/j.specom.2004.09.008

T. Lallouache, Un poste visage-parole Acquisition et traitement des contours labiaux (a device for the capture and processing of lip contours), Proc. XVIII Journées d'Étude sur la Parole (JEP), pp.282-286, 1990.

H. Lane and B. Tranel, The Lombard Sign and the Role of Hearing in Speech, Journal of Speech Language and Hearing Research, vol.14, issue.4, pp.677-709, 1971.
DOI : 10.1044/jshr.1404.677

L. Bouquin-jeannès, R. Faucon, and G. , Study of a voice activity detector and its influence on a noise reduction system, Speech Communication, vol.16, issue.3, pp.245-254, 1995.
DOI : 10.1016/0167-6393(94)00056-G

P. Liu, W. , and Z. , Voice activity detection using visual information, Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.609-612, 2004.

E. Lombard, Le signe de l'élévation de la voix (the sign of voice rise), Annales des maladies de l'oreille et du larynx 37, pp.101-119, 1911.

D. Macho, J. Padrell, A. Abad, C. Nadeu, J. Hernando et al., Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus, 2005 IEEE International Conference on Multimedia and Expo, pp.876-879, 2005.
DOI : 10.1109/ICME.2005.1521563

H. Mcgurk and J. Mcdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, pp.746-748, 1976.
DOI : 10.1038/264746a0

K. G. Munhall, P. Gribble, L. Sacco, and M. Ward, Temporal constraints on the McGurk effect, Perception & Psychophysics, vol.88, issue.3, pp.351-362, 1996.
DOI : 10.3758/BF03206811

K. G. Munhall, P. Servos, A. Santi, and M. Goodale, Dynamic visual speech perception in a patient with visual form agnosia, NeuroReport, vol.13, issue.14, pp.13-1793, 2002.
DOI : 10.1097/00001756-200210070-00020

K. G. Munhall and E. Vatikiotis-bateson, The moving face during speech communication in Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech edited by, pp.123-139, 1998.

E. D. Petajan, Automatic lipreading to enhance speech recognition, Proc. Global Telecommunications Conference (GLOBCOM), pp.265-272, 1984.
DOI : 10.1145/57167.57170

G. Potamianos, C. Neti, and G. Gravier, Recent advances in the automatic recognition of visual speech, Proc. IEEE, pp.91-1306, 2003.

G. Potamianos, C. Neti, and S. Deligne, Joint audio-visual speech processing for recognition and enhancement, Proc. Audio-Visual Speech Processing (AVSP), pp.95-104, 2003.

J. Ramirez, J. C. Segura, C. Ben?tez, A. De-la-torre, and A. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication, vol.42, issue.3-4, pp.271-287, 2004.
DOI : 10.1016/j.specom.2003.10.002

J. Ramírez, J. C. Segura, C. Benítez, L. García, and A. Rubio, Statistical voice activity detection using a multiple observation likelihood ratio test, IEEE Signal Processing Letters, vol.12, issue.10, pp.12-689, 2005.
DOI : 10.1109/LSP.2005.855551

R. Rao, C. , and T. , Cross-Modal Predictive Coding for Talking Head Sequences, Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.2058-2061, 1996.
DOI : 10.1007/978-1-4613-0403-6_37

B. Rivet, L. Girin, and J. C. , Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.1, pp.96-108, 2007.
DOI : 10.1109/TASL.2006.872619

URL : https://hal.archives-ouvertes.fr/hal-00174100

B. Rivet, L. Girin, and J. C. , Visual voice activity detection as a help for speech source separation from convolutive mixtures, Speech Communication, vol.49, issue.7-8, pp.667-677, 2007.
DOI : 10.1016/j.specom.2007.04.008

URL : https://hal.archives-ouvertes.fr/hal-00499184

J. Robert-ribes, J. L. Schwartz, T. Lallouache, and P. Escudier, Complementary and synergy in bimodal speech: auditory, visual, and audio-visual identification of French oral vowels in noise, J. Acoust. Soc. Am, vol.6, pp.3677-3689, 1998.

L. D. Rosenblum and H. M. Saldana, An audiovisual test of kinematic primitives for visual speech perception., Journal of Experimental Psychology: Human Perception and Performance, vol.22, issue.2, pp.318-331, 1996.
DOI : 10.1037/0096-1523.22.2.318

L. D. Rosenblum, J. A. Johnson, and H. M. Saldana, Point-Light Facial Displays Enhance Comprehension of Speech in Noise, Journal of Speech Language and Hearing Research, vol.39, issue.6, pp.1159-1170, 1996.
DOI : 10.1044/jshr.3906.1159

J. L. Schwartz, F. Berthommier, and C. Savariaux, Seeing to hear better: evidence for early audio-visual interactions in speech identification, Cognition, vol.93, issue.2, pp.69-78, 2004.
DOI : 10.1016/j.cognition.2004.01.006

URL : https://hal.archives-ouvertes.fr/hal-00186797

D. Sodoyer, L. Girin, C. Jutten, and J. L. Schwartz, Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.11, pp.1165-1173, 2002.
DOI : 10.1155/S1110865702207015

D. Sodoyer, L. Girin, C. Jutten, and J. L. Schwartz, Further experiments on audiovisual speech source separation, Speech Communication, vol.44, pp.1-4, 2004.

D. Sodoyer, B. Rivet, L. Girin, C. Jutten, and J. L. Schwartz, An Analysis of Visual Speech Information Applied to Voice Activity Detection, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.601-604, 2006.
DOI : 10.1109/ICASSP.2006.1660092

URL : https://hal.archives-ouvertes.fr/hal-00361750

J. Sohn, N. S. Kim, S. , and W. , A statistical model-based voice activity detection, IEEE Signal Processing Letters, vol.6, issue.1, pp.1-3, 1999.
DOI : 10.1109/97.736233

W. H. Sumby and I. Pollack, Visual Contribution to Speech Intelligibility in Noise, The Journal of the Acoustical Society of America, vol.26, issue.2, pp.212-215, 1954.
DOI : 10.1121/1.1907309

Q. Summerfield, Some preliminaries to a comprehensive account of audio-visual speech perception in Hearing by eye: The psychology of lip-reading edited by, pp.3-51, 1987.

Q. Summerfield, Use of Visual Information for Phonetic Perception, Phonetica, vol.36, issue.4-5, pp.314-331, 1979.
DOI : 10.1159/000259969

S. G. Tanyer and H. Ozer, Voice activity detection in nonstationary noise, IEEE Transactions on Speech and Audio Processing, vol.8, issue.4, pp.478-482, 2000.
DOI : 10.1109/89.848229

S. M. Thomas, J. , and T. R. , Contributions of Oral and Extraoral Facial Movement to Visual and Audiovisual Speech Perception., Journal of Experimental Psychology: Human Perception and Performance, vol.30, issue.5, pp.873-888, 2004.
DOI : 10.1037/0096-1523.30.5.873

W. Wang, D. Cosker, Y. Hicks, S. Sanei, and J. A. Chambers, Video Assisted Speech Source Separation, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.425-428, 2005.
DOI : 10.1109/ICASSP.2005.1416331

H. Yehia, P. Rubin, and E. Vatikiotis-bateson, Quantitative association of vocal-tract and facial behavior, Speech Communication, vol.26, issue.1-2, pp.23-43, 1998.
DOI : 10.1016/S0167-6393(98)00048-X

H. Yehia, T. Kuratate, and E. Vatikiotis-bateson, Facial animation and head motion driven by speech acoustics, Proc. Seminar on Speech Production: Models and Data & CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, pp.265-268, 2000.
DOI : 10.1006/jpho.2002.0165