Duration as perceptual voicing cues in whisper

Yohann Meynadier; Sophie Dufour; Yulia Gaydina

Résumé

This study concerns the production and the perception of the phonological voicing contrast in whispered speech in French. Whisper is a mode of phonation naturally used in order to reduce the perceptibility of speech, mainly substituting the periodic sound source of modal voice by a noisy sound source. Whispered voice induces many changes: (i) intensity lowering, frequency flattening and formant rising [1-6]; (ii) lengthening of speech units and speech rate decrease [3, 5-7]; (iii) increase of airflow and air consumption [8]; and (iv) some kind of hyperarticulation [9]. Concerning perception, segmental and suprasegmental information is generally well perceived, with a recognition level above the chance: (i) vowel identity [10]; (ii) consonant place and manner [11]; even (iii) intonation, accent [3, 6, 12] or tone [13]; and strikingly (iv) voicing feature [6, 11, 14-16], as it is targeted here for French. This study focuses on duration of pre-consonantal vowels and obstruents as secondary phonetic cues in production and perception of the phonological voicing in whispered speech, i.e. without phonetic (physiological and acoustic) voicing. In modal speech, these properties are part of numerous secondary phonetic cues commonly reported for voicing [17]. Duration of consonants and pre-consonant vowels are long frequently observed: (i) vowels are longer before voiced than voiceless consonants [2, 19, 20] and, (ii) voiceless obstruents are longer than voiced ones [2, 21, 22] (for a review and discussion). A first experiment on production confirms that the phonological voicing contrast is also realized in whisper. Alternatively in modal and whisper phonations, 4 French speakers read 12 non-sense and 12 lexical words embedded the voiced and unvoiced obstruents /b-p/, /t-d/, /k-g/, /f-v/, /s-z/ and /ʃ-ʒ/ in word-median position. The list-reading recordings were experimentally controlled: random order, fillers, anechoic room, etc. As in modal phonation, in whisper acoustic durations show that unvoiced consonants are significantly 31 ms longer than voiced ones. The difference between unvoiced and voiced fricatives reduces from modal (delta = 48 ms) to whisper (delta = 37 ms). For stops, the difference remains constant: 28 in modal speech and 26 ms in whisper. Similar significant differences are observed whatever the phonation mode for pre-consonantal vowels: delta = 11 ms before stops and delta =19 ms before fricatives. So, the durational differences associated with the phonological voicing contrast of obstruents are also kept in whisper production. In a second experiment on perception, durations of consonant-median closure and pre-consonantal vowel were acoustically manipulated to fit the duration of the counterpart member of a minimal pair (e.g. [d] to [t]… and vice versa). The proportion of the temporal lengthening or shortening of segments were based on the empirical results of the production test. The perception test was experimentally controlled: stimuli, random order, fillers, intensity level, experimental materials, etc. First analyses show that the perception decreases slightly for whispered voiced obstruents (close to 90% of correct responses), but surprisingly very dramatically for unvoiced ones (around the chance level). Crucially, the results showed that consonant duration has more impact on the recognition of the voicing than vowel duration. These effects are cumulative, depending on the case. These results are discussed in relation to previous studies. Finally, to our knowledge, this study is the first attempt (at least in French) to clear duration effects on voicing perception in whisper.

Duration as perceptual voicing cues in whisper

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager