The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

Carolina Sánchez-García; Sonia Kandel; Christophe Savariaux; Salvador Soto-Faraco

doi:10.1163/22134808-00002560

Article Dans Une Revue Multisensory Research Année : 2018

The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

(1) , (2) , (3) , (4)

1
2
3
4

Carolina Sánchez-García

Fonction : Auteur

Universitat Pompeu Fabra [Barcelona]

Sonia Kandel

Fonction : Auteur
PersonId : 1023725
IdHAL : sonia-kandel
ORCID : 0000-0002-8017-3758

GIPSA - Voix Systèmes Linguistiques et Dialectologie

Christophe Savariaux

Fonction : Auteur
PersonId : 18004
IdHAL : christophe-savariaux
ORCID : 0000-0002-5332-7934

GIPSA-Services

Salvador Soto-Faraco

Fonction : Auteur

Institució Catalana de Recerca i Estudis Avançats = Catalan Institution for Research and Advanced Studies

Résumé

Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.

Mots clés

Audio-visual multisensory integration speech perception gating

Domaines

Sciences cognitives Psychologie Linguistique

Sonia Kandel : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01983487

Soumis le : mercredi 16 janvier 2019-14:42:07

Dernière modification le : mercredi 17 avril 2024-14:36:04

Dates et versions

hal-01983487 , version 1 (16-01-2019)

Identifiants

HAL Id : hal-01983487 , version 1
DOI : 10.1163/22134808-00002560

Citer

Carolina Sánchez-García, Sonia Kandel, Christophe Savariaux, Salvador Soto-Faraco. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study. Multisensory Research, 2018, 31 (1-2), pp.57-78. ⟨10.1163/22134808-00002560⟩. ⟨hal-01983487⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS GIPSA GIPSA-DPC GIPSA-VSLD

63 Consultations

0 Téléchargements

The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager