Abstract : Speech perception is an interactive process involving both the auditory and visual modalities. Recent studies demonstrate that somatosensory input can also influence the process of auditory speech perception (Ito et al. 2009; Gick and Derrick 2009). However, it is unknown whether and to what extent somatosensory inputs modulate audio-visual speech perception. To address this question, we explored the neural consequence of somatosensory interactions with audio-visual speech processing under audio-visual speech processing conditions. Specifically, using the McGurk effect (the perceptual illusion that occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound) we assessed whether somatosensory orofacial stimulation influenced event-related potentials (ERPs) during audio-visual speech perception. We recorded ERPs from 64 scalp sites in response to audio-visual speech processing and somatosensory stimulation. We applied an auditory stimulus /ba/ synchronized with the video of congruent facial motion (the production of /ba/) or incongruent facial motion (the production of the /da/: McGurk condition). The congruent and incongruent audio-visual stimulations were randomly presented with and without somatosensory stimulation associated with facial skin deformation. We recorded ERPs under auditory alone, somatosensory alone, and auditory-somatosensory stimulus conditions. The subjects were asked to judge whether the production was /ba/ or not. We observed a clear McGurk effect in the behavioral responses with the subjects identifying the sound as /ba/ in the congruent audio-visual condition, but not in the incongruent condition. Concurrent somatosensory stimulation modified the ability of participants to correctly identify the production as /ba/ relative to the non-somatosensory condition. We found ERPs differences associated with the McGurk effect in the presence of the somatosensory conditions. ERPs for the McGurk effect reliably diverge around 280 ms after stimulation onset. The results demonstrate a clear multisensory convergence of somatosensory and audio-visual processing in both behavioral and neural processing and suggest that somatosensory information encoding facial motion also influences audio-visual speech processing.