Orofacial somatosensory inputs improve speech sound detection in noisy environments

Rintaro Ogane 1 Jean-Luc Schwartz 1 Takayuki Ito 1, 2
1 GIPSA-PCMD - PCMD
GIPSA-DPC - Département Parole et Cognition
Abstract : Noise in speech communication reduces intelligibility and makes it more difficult for the listener to detect the talker's utterances. Seeing the talker's facial movements aids the perception of speech sounds in noisy environments (Sumby & Pollack, 1954). More specifically, it has been demonstrated in psychophysical experiments that visual information from facial movements facilitated the detection of speech sounds in noise (audiovisual speech detection advantage, Grant & Seitz, 2000; Kim & Davis, 2004). Besides visual information, the somatosensory information also intervenes in speech perception. The somatosensory information has been shown to modify speech perception in quite (Ito et al., 2009; Ogane et al., 2017), but it might also be useful for the detection of speech sounds in noisy environments. The aim of this study is to examine whether orofacial somatosensory inputs facilitate the detection of speech sounds in noise. We carried out a detection test involving speech sounds in acoustic noise and examined whether the detection threshold was changed by somatosensory stimulation associated with facial skin deformation. In the auditory perception test, two sequential noise sounds were presented through headphones. A target speech sound /pa/, which was recorded by a native French speaker, was embedded inside either of the two noise stimuli, at a random position in time (0.2 or 0.6 s after noise onset). Participants were asked to identify which noise sound contained the speech stimulus by pressing a keyboard key as quickly as possible. We tested 10 signal-to-noise ratio (SNR) levels between the target speech sound and the background noise (from -8 dB to -17 dB). The percentage of correct detection response was obtained at each SNR level, providing the estimation of psychometric functions. The detection threshold level was defined as the point at 75 % correct detection in the estimated psychometric function. We compared the detection threshold in two experimental conditions: in a pure auditory condition and in a condition in which somatosensory stimulation was added. In the somatosensory condition, facial skin deformation generated by a robotic device was applied in both noise intervals. The somatosensory stimulation timing was matched with the timing of the target speech sound onset (burst onset). The two experimental conditions contained all SNR levels with 20 occurrences per SNR level (hence 200 responses per condition), and the 400 stimuli (grouping the two conditions) were presented in a randomized order. We found that the detection threshold level was lowered when somatosensory stimulation was applied (with a 0.6 dB decrease in SNR at threshold). This “audio-somatosensory detection advantage” shows the role of somatosensory inputs for processing speech sounds even in noisy environments, and is consistent with the idea that the somatosensory information is part of the speech perception process.
Document type :
Poster communications
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01859597
Contributor : Rintaro Ogane <>
Submitted on : Friday, March 29, 2019 - 12:50:34 PM
Last modification on : Tuesday, April 9, 2019 - 3:04:31 PM

File

Ogane2018_SNL2018PosterFinal.p...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01859597, version 1

Collections

Citation

Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito. Orofacial somatosensory inputs improve speech sound detection in noisy environments. 10th Annual Society for the Neurobiology of Language Conference (SNL 2018), Aug 2018, Québec City, Canada. ⟨hal-01859597⟩

Share

Metrics

Record views

116

Files downloads

3