Embodiment as preparation: Pupillary responses to words that convey a sense of brightness or darkness

A strongly embodied view of language holds that, to understand a word, you must simulate associated sensory input (e.g. simulate perception of brightness to understand ‘lamp’), and prepare associated actions (e.g. prepare finger movements to understand ‘typing’). To test this, we measured pupillary responses to single words that conveyed a sense of brightness (e.g. ‘day’) or darkness (e.g. ‘night’), or were luminance-neutral (e.g. ‘house’). Crucially, we found that the pupil was largest for darkness-conveying words, intermediate for neutral words, and smallest for brightness-conveying words; however, this semantic pupillary response peaked long after participants had already understood and responded to the words. These findings suggest that word comprehension activates sensory representations, and even triggers physiological (pupillary) responses, but that this occurs too late to be a necessary part of the comprehension process itself. Instead, we suggest that pupillary responses to darknessand brightness-conveying words–and perhaps embodied language in general–may reflect preparation for the immediate future: When you read the word ‘lamp’, you automatically prepare to look at a lamp, and prepare to read more brightness-related words; this may cause your pupils to constrict in anticipation.


Significance statement
Most researchers agree that language is neither fully embodied, nor fully disembodied; however, a wide variety of intermediate views are consistent with currently available evidence. Therefore, we need new methods to address specific, well-defined questions about how (rather than merely whether, or how strongly) language is embodied. Here we present one such method, based on the pupillary light response. An important advantage of this new method is that it allows us to probe not only whether, but also when, sensory representations emerge during reading of individual How are you able to understand the words that compose this sentence?
Embodied theories of language hold that you understand words-at least those that refer to concrete actions or objects-by mentally simulating what you can do with their referents, and what these referents look, smell, and feel like. For example, according to embodied language, when you read the word 'keyboard', you mentally simulate a typing action; and when you read the word 'sun', you simulate the perception of a bright ball of fire in the sky. Strongly embodied views of language hold that such simulations are necessary for comprehension; that is, to understand what 'sun' means, you need a sensory representation of what it looks like (1,2). Weakly embodied views of language hold that simulations may facilitate language comprehension, but are not strictly necessary; that is, mentally picturing the sun may help you to read 'sun', but you could understand 'sun' even without any sensory representation of it, by relying on a symbolic system (3)(4)(5)(6).
Most support for embodied language comes from two general approaches: behavioral studies that look at compatibility effects between word meaning and perception (or action) (7)(8)(9)(10)(11); and neuroimaging studies that compare brain activity during word reading with brain activity during perception (or action) (12)(13)(14). A compelling example of a behavioral compatibility effect was reported by Meteyard and colleagues (7), who found that upward/ downward visual motion affects comprehension speed of words with an upward/ downward meaning (8,9); that is, participants decided more quickly that 'fall' was a real word (as opposed to a nonword) when they simultaneously saw downward-moving dots. From this, Meteyard and colleagues concluded that understanding downward-conveying words relies, at least in part, on the same brain areas as perception of downward motion. This conclusion is, on the surface, supported by neuroimaging studies that show overlap in the brain areas that are active during both reading of words associated with motion, and perception of motion (13).
This study and many others clearly show that language interacts with perception and action. But they do not necessarily support a strongly embodied view of language, because they can also be explained in a weakly embodied view by assuming that neural activation spreads between connected areas (4,5). According to a spreading-activation account, the results of Meteyard and colleagues (7) can be explained as follows: One brain area is involved in perceiving downward motion, while another is involved in processing words like 'fall'. These two areas are connected, and activation from one area spreads to the other; this explains why perceiving downward motion facilitates comprehension of downward-conveying words. But, in this view, the brain areas that are involved in perception and language are nevertheless distinct, and have a clear division of labor.
As pointed out in recent leading reviews (3,4), researchers generally agree that language is neither fully embodied nor fully disembodied; but they hold a wide variety of intermediate views, all of which are consistent with currently available evidence. To progress beyond this point, new methods are needed to address specific, well-defined questions about how language is embodied.
Here we present one such method, based on the pupillary light response: the constriction (shrinkage) of the eye's pupils to brightness, and dilation (expansion) of the pupils to darkness.
The pupillary light response was traditionally believed to be a low-level reflex to light; however, recent studies have shown that the light response is sensitive to high-level cognition (15,16). For example, the pupil constricts when you covertly (without looking at it) attend to a bright, compared to a dark, object (17)(18)(19). Similarly, the pupil constricts when you imagine a bright object (20), or when a bright object reaches awareness in a binocular rivalry paradigm (21). These phenomena are often explained in terms of top-down modulation of visual brain areas (22,23); that is, the pupil constricts when you covertly attend to a bright object, because attention enhances the representation of the bright object throughout visual cortex (24).
This reasoning can be naturally extended to embodied language: If word comprehension activates visual cortex (i.e. creates sensory representations), then understanding words that convey a sense of brightness or darkness should trigger pupillary responses-just like attending to (17) or imagining (20) bright or dark objects. Phrased differently, if brightness-conveying words trigger a pupillary constriction relative to darkness-conveying words, this would support the view that word comprehension affects sensory brain areas, and can even trigger physiological (pupillary) responses.

Results
Words were presented for 3 s, or until participants pressed a key. There were four word categories: brightness-conveying, darkness-conveying, neutral, and animal names. Participants (N=30) pressed a key when they saw an animal name, and did nothing otherwise (i.e. go/ no-go semantic categorization). Trials on which participants gave an incorrect response (false alarms and misses) were discarded (0.36%). The average response time to animal words was 792 ms (SE = 8.2). No participants or (correct) trials were excluded from the analysis.
The main results are shown in Figure 1, in which pupil size is plotted over time from word onset, separately for brightness-conveying, darkness-conveying, and neutral words. (Animal names are not shown, because the pupillary response is distorted by participants' key-press responses.) As predicted, the pupil was smaller when participants read brightness-compared to darknessconveying words (statistics described below). This effect arose gradually and slowly, and peaked between 1.5 and 2 s after word onset. For neutral words, which did not convey a specific sense of brightness, pupil size was intermediate. In addition to the effect of semantic brightness, there was a pronounced pupillary dilation that peaked around 0.6 s after word onset. This is an alerting effect, or orienting response (23,25), caused by the appearance of the word; this early pupillary response was not clearly modulated by the semantic brightness of the words.  Pupil size is reported relative to pupil size at word onset, and smoothed with a 51 ms hanning window. Blinks were reconstructed with cubic-spline interpolation (26) Only brightness-and darkness-conveying words were carefully matched (see Methods), and therefore only these two categories were included in statistical tests. (However, including all words yields similar results; see Supplementary Information.) For each 10 ms window, we conducted a linear-mixed effects model (with the R packages lme4 and lmerTest) with pupil size as dependent variable, semantic brightness (bright or dark) as fixed effect, and random by-participant and by-item intercepts and slopes (27). We commonly use a significance threshold of at least 200 contiguous milliseconds where p < .05 (17); with this criterion, the effect is reliable between 1600 and 2020 ms. However, for visualization, we have annotated Figure 1 with three alpha thresholds ( p < .05, p < .01, p < .005) and no minimum number of contiguous samples.
To test how general the effect is, we also looked at mean pupil size during the 1.5 -2 s window for individual participants and words ( Figure 2). As shown in Figure 2a, the majority of the participants showed an effect in the predicted direction, and this effect was strongly supported by a default Bayesian one-sided, one-sample t-test (Bf=14.6 in support of a positive effect; for reference, a classic one-sided, one-sample t-test: t(29) = 3.0, p = .003; using JASP (28)). As shown in Figure   2b, the effect of semantic brightness was small relative to the variability between words; however, there was a clear shift in the distribution of pupil sizes, so that pupil size was slightly larger for darkness-than brightness-conveying words, which was again strongly supported by a default Bayesian one-sided, independent-samples t-test (Bf=10.7 in support of larger pupils for darknessthan brightness-conveying words; for reference, a classic one-sided, independent-samples t-test:  To test whether the effect of semantic brightness could be due to differences in valence or emotional intensity, we collected subjective ratings for the semantic brightness, valence (positive/ negative), and emotional intensity of all words (see Methods for details). First, we found a weak but reliable correlation between brightness and emotional intensity (r = .22, p = .027), such that bright words were rated more emotionally intense than dark words; this would drive an effect in the opposite direction from what we observed, because emotionally intense stimuli (bright words in our case) trigger a strong pupillary dilation (29). In addition, semantic brightness was a much better predictor of pupil size than emotional intensity (see Supplementary Information). Second, we found a strong and reliable correlation between brightness and valence (r = .89, p < .001), such that bright words were rated more positive than dark words. However, valence probably has no effect on pupil size beyond that of emotional intensity; that is, the pupil dilates as much to negative as to positive stimuli if they are of equal emotional intensity (30); or, according to some (largely discredited) reports, negative stimuli (dark words in our case) cause a pupillary constriction (31), which should again drive an effect in the opposite direction from what we observed. In summary, it is theoretically and statistically unlikely that the effect of semantic brightness on pupil size is driven by differences in valence and emotional intensity of our stimuli. If anything, the fact that the bright words were rated as more emotionally intense than the dark words suggests that our results slightly underestimate the effect of semantic brightness on pupil size (because the effect might be counteracted by an effect of emotional intensity).

Discussion
Here we report that the eye's pupils constrict after reading brightness-conveying words (e.g. 'sun') compared to darkness-conveying words (e.g. 'night'). This effect arises slowly and gradually, and, in our experiment, peaked between 1.5 and 2 s after word onset.
Our findings have important implications for theories of embodied language. Our starting premise is that an indirect (i.e. without direct visual stimulation) pupillary light response reflects sensory representations that are similar to those that arise during perception (20), and presumably involve visual brain areas (23). Our findings therefore suggest that word comprehension can induce activity in visual brain areas, and even trigger physiological (pupillary) responses, in a way that corresponds to the word's meaning. This is in line with a strongly embodied view of language, which holds that language processing involves sensory and motor areas of the brain (1, 2).
However, we also found that this semantic pupillary effect arose late, and peaked only about a second after participants had processed the word's meaning and responded to it. This suggests that sensory and motor representations, at least those that are involved in pupillary responses, are not necessary for word comprehension: The pupil responds to the semantic brightness of words, but this pupillary response is not part of word comprehension itself (4).
Is it possible that, in our experiment, word comprehension did rapidly trigger activity in visual brain areas, but that the pupil simply responded very slowly to this? This is unlikely, if you compare the present results to those from an attentional-capture experiment that we conducted previously (23).
In this experiment, attention was covertly (without eye movements) captured toward a bright or dark background. Within about 500 ms, the pupil became larger when attention was captured toward a dark, compared to a bright, background. More specifically, attentional capture toward brightness or darkness modulated the initial pupillary dilation that we also observed in the present experiment, but which was not modulated by semantic brightness (the peak around .6 s in Figure   1). Phrased differently, indirect pupillary light responses can emerge far more rapidly than we have observed in the present experiment.
Does this mean that sensory and motor representations are merely by-products of word comprehension, without any function? Not necessarily: While we think it is unlikely that, at least in our experiment, sensory and motor representations were necessary for participants to understand the words, such representations may nevertheless be useful for preparation. For example, when you hear the name of an object that is in your field of view, you are very likely to look at it (32); therefore, the pupillary constriction that is triggered by reading the word 'lamp' may reflect preparation for looking at a lamp (33). Similarly, when you read the word 'night', the words that follow are likely to be related to the concept of darkness, as in the sentence "the night was dark." Therefore, it is conceivable that the sensory and motor representations that are triggered by one word (here 'night') facilitate comprehension of subsequent semantically related words (here 'dark') (34). In this view, sensory and motor representations that arise during word comprehension are not part of the comprehension process-at least not of the word that triggered the representationsbut are a way to mentally prepare for perceptions, actions, and words that are likely to occur in the immediate future.
In summary, we have shown that the pupil constricts when reading brightness-compared to darkness-conveying words. This suggests that word comprehension can activate sensory representations, and even trigger physiological (pupillary) responses; however, the pupillary response to semantic brightness arose too late to be a necessary part of word comprehension.
Instead, we have suggested that this response may reflect preparation for the immediate future: When you read the word 'lamp', you prepare to look at at a lamp, and you expect subsequent words to be related to brightness; this may cause your pupils to constrict.

Materials and availability
Participant data, experimental scripts, and analysis scripts are available from https://github.com/smathot/semantic_pupil.

Stimuli
We manually selected 121 words from Lexique (35), a large database with lexical properties of French words. There were four word categories: brightness-conveying words (e.g. illuminé or 'illuminated'; N=33), darkness-conveying words (e.g. foncé or 'dark'; N=33), neutral words (e.g. that had approximately the same number of letters, then generating images of these words, and finally iteratively resizing these images until the visual intensity (i.e. summed luminance) of the words was almost identical between the two categories.
In the end, we had a stimulus set in which darkness-and brightness-conveying words were very closely matched; however, as a result of our stringent criteria, our set contained several variations of the same words, such as briller ('to shine') and brillant ('shining'). But given the pupil's sensitivity to slight differences in task difficulty (i.e. lexical frequency) and visual intensity, we felt that matching was more important than having a highly varied stimulus set. At the beginning of each session, a nine-point eye tracker calibration was performed. Before each trial, a single-point recalibration (drift correction) was performed. Each trial started with a dark central fixation dot on a gray background for 3 s. Next, a word was presented centrally for 3 s, or until the participant pressed the space bar. The participant's task was to press the space bar whenever an animal name was presented, and to withhold response otherwise. Participants saw each word once, with the exception of pénombre which, due to a bug in the experiment, was shown twice. (This is why there are slightly more darkness-than brightness-conveying trials, as shown in Figure 1.) Word order was fully randomized.

Normative ratings
For each word, we collected normative ratings from thirty naive observers (age range: 18-29 y; 17 women), most of whom had not participated in the pupillometry experiment. Participants received €2 for their participation.
Words were presented, one at a time and using the same images as used for the pupillometry experiment, together with a five-point rating scale. On this scale, participants indicated how strongly the word conveyed a sense of brightness ('Very bright' to 'very dark'), or, in a different phase of the experiment, the word's valence ('Very negative' to 'very positive'). Brightness and valence were rated in separate blocks, the order of which was counterbalanced across participants. Based on valence ratings, we calculated the emotional intensity of the words, as the deviation from neutral valence (Intensity = |3-valence|).