The social component of phonetic recalibration in speech perception

Abstract : Listeners can adjust and recalibrate their phonetic boundaries based on exposure to new speech input (Norris et al., 2003). In this study, we investigate whether social factors external to the speech signal during exposure can affect this phonetic recalibration. Specifically, we examine whether phonetic recalibration is modulated by the facial expression of the speaker. Existing studies show that speech production and perception are dynamically sensitive to social characteristics of the speaker (Niedzielski, 1997; Johnson et al., 1999; Babel 2012), but there has been little research on whether perceptual learning (i.e., phonetic recalibration) is similarly sensitive to social factors. During a training phase, participants were presented auditorily with 60 words with word-medial /d/ (academia), 60 with word-medial /t/ (politician), and 60 filler words containing neither /d/ nor /t/. An additional 180 non-word fillers contained neither /d/ nor /t/. The auditory material was produced by a female native speaker of American English. The task of the participants was to make a lexical decision for the spoken stimuli. Crucially, the /t/s were carefully manipulated-by shortening VOT and closure length-to be ambiguous between /t/ and /d/. This manipulation was verified in a separate norming study. During the training phase, a picture of a woman was presented on screen. In one condition (Smile), the woman was smiling; in another (No-Smile), the same woman was not smiling. A further condition presented no picture during the training (No-Face). After the training phase, the participants performed a categorization task for tokens on an 11-step /ata/-/ada/ continuum to assess whether their category boundary had shifted. Since the /t/ sounds in the training are closer to /d/ than usual, if perceptual learning occurs, the boundary should shift towards the /d/-end of the continuum. Data from 45 female participants appear in Figure 1. (Data collection is ongoing; the study will ultimately include 60 females (20*3 conditions) and 60 males.) Listeners in the No-Face condition show a robust positive effect of perceptual learning; they choose /t/ more often for higher continuum steps than the No-Training group (z=2.8, p=0.005). (The baseline No-Training data was obtained from a separate group of 20 females.) This verifies that our stimuli and procedures produce a standard perceptual learning effect. For conditions with a picture, the No-Smile condition shows a positive perceptual learning trend (z=1.8, p=0.08). Listeners in the Smile condition, however, show little evidence for perceptual learning (z=1.4, p=0.18). (Initial visual inspection of data from 24 males shows a different pattern, perhaps indicating better perceptual learning with the Smile than the No-Smile condition.) The pattern from females runs somewhat counter to studies on learning that report better learning outcomes with more attractive or likable instructors (Westfall et al., 2016), though Babel (2012) shows that greater likeability and attractiveness can sometimes result in reduced phonetic imitation. The current study provides a novel finding that phonetic recalibration is affected by speech-external social factors, though more research is needed to understand the role of specific facial expressions.
