Why Do People Continue to Live Near Polluted Sites? Empirical Evidence from Southwestern Europe

Poverty is a major determinant for pollution exposure, according to the US location choice literature. In this paper, we assess the impact of socio-economic status on location choices in the European context. Our analysis relies on an original dataset of 1194 households living in polluted and non-polluted areas in three European countries: Spain, Portugal, and France. We use instrumental variables strategies to identify the socioeconomic causes of location choices. We show that low education, wealth, and income are main reasons for living in polluted areas. We provide several robustness checks testing for the exogeneity of selected instruments. We observe that unobserved heterogeneity tends to understate the impact of socioeconomic status on residence location. Interestingly, we highlight that an important proportion of intermediate social groups (especially young couples) are living in polluted areas, probably because of place attachment and affordable housing facilities. Similarly, we show that middle-income households have lower move-out intentions than other income groups. These latter results contrast the linear vision of environmental inequalities found in the USA.


Introduction
Notwithstanding the strong development of locational choice models in recent decades, empirical evidence for the relationship between socioeconomic status and pollution exposure in Europe remains limited. In the USA, the existing literature identifies a negative correlation between socioeconomic status and exposure to pollution [1,2]. Some authors also find evidence of a causal effect [3]: richer households tend to move away from polluted areas while poorer households are more likely to move in; hence, pollution is leading to social segregation. In Europe, the literature focused on the role of natural amenities, rather than pollution [4], to explain patterns of social segregation.
Results are ambiguous: Schaeffer et al. [5] found that natural amenities increase the mutual segregation between executives and other workers in the French region of Marseille, but not in the region of Grenoble. In the Netherlands, van Duijn and Rouwendal [6] showed that double earners prefer natural living environments while highly educated households placed more value on historical amenities. De Palma, Picard and Waddell [7] highlight the important role of the noise disamenity, next to natural amenities and transport amenities, in location choices in the area of Paris. Schaeffer et al. [5] conclude that location patterns ultimately depend on the interplay between natural and other amenities. According to sociological studies, the dual and linear definition of environmental inequalities should be contrasted in Europe. For instance, in French towns near hazardous industrial wastes, Flanquart, Hellequin and Vallet [8] observe an over-representation of households with moderate standards of living (and not the poorest). Inequalities resulting from pollution also vary from city to city [9]. In the end, there is a non-trivial relationship between environmental and social constraints, explaining what may be termed socio-environmental segregation. 1 In this paper, we study how socio-environmental segregation develops in polluted areas in Southwestern Europe.
Several theoretical pathways may lead certain social groups to disproportionally live in polluted areas [2]. First, industries tend to locate their polluting activities in 'favorable' geographic and economic areas, where land is available, the labor force is cheap, and transport networks are well developed. Second, households choose their residential location depending on their willingness to pay [10]: even if lower-income households would prefer to live in a cleaner area, they are not willing to pay the higher price. Hence, in the long run, poorer households (as well as certain ethnic groups) end up in more polluted areas [11]). 2 Next, households in polluted areas have generally less ability to influence governments and may less easily mobilize against existing rules in favor of industries [2]. Finally, residents may accept a certain level of pollution exposure in exchange for compensation provided by the polluting firm [2,12], such as employment opportunities and direct investments in local amenities (e.g., parks, sports infrastructure, and cultural centers). 3 Note that the over-representation of households with moderate standards of living observed by Flanquart et al. [8] in French polluted areas may be explained by this pathway. Moreover, place or community attachment might also explain the over-representation of intermediate social classes in polluted areas [13].
Given the ambiguities concerning residential preferences in Europe, this article aims to contribute to the literature by quantitatively identifying the factors that determine the probability of living in polluted areas. In addition, we investigate the main determinants of the intention to move out of polluted areas in the next 5 years. Note that we analyze the admitted determinants (i.e., economic and social circumstances), besides the more hidden (and often omitted) determinants such as community attachment and risk taking behavior [8,13,14]. Our methodological approach, based on an instrumental variables (IV) strategy, addresses potential endogeneity issues due to reverse causality and unobserved heterogeneity in the relationship between household socioeconomic status and residential location. To our best knowledge, the use of IV strategy in this context is rare [15,16,17], which is another notable contribution of this study. The respondent's height is used as main instrument, arguing that this factor is strongly correlated with socioeconomic status but exogenous to residential location. Our empirical tests validate the conditions that an instrument must satisfy.
Our analysis is based on an original dataset of 1194 households and 2787 individuals in three study areas in Southwestern Europe (Spain, Portugal, and France). Comparing the populations from several polluted areas with the populations from similar but non-polluted areas, our results emphasize the presence of strong environmental inequalities. The IV strategy shows that socioeconomic determinants (i.e., education, income, and wealth) have strong and negative effects on the probability of living in polluted areas. We clearly observe that observed heterogeneity such as place attachment and local facilities tend to understate the effects of socioeconomic status on residential location. In contrast to the results from the US literature, we confirm sociological assumptions that European polluted areas also seem to be an alternative for young families with lower-middle socioeconomic status who find an interest in affordable housing facilities and employment opportunities, services and infrastructures [8]. In line with these intuitions, we find that middle-income families seem to see advantages in remaining in polluted areas. As expected, households that are less risk averse and more attached to their community also tend to live in polluted areas. We conclude that such Europeanspecific endogenous non-linearities contribute to make the effect of poverty on risky housing lower.
The structure of the article is the following. In Sect. 2, we describe the database and provide some contextual information about the case studies on which we base our analysis. In Sect. 3, we explain the methods we use to identify the main determinants of the probability of living and continuing to live in polluted areas versus living in cleaner ones. In Sect. 4, we present our results and, in Sect. 5, we conclude and discuss the related public policy implications.

An Original Dataset
From October 2018 to January 2019, we conducted three household surveys of 1194 households in France, Portugal, and Spain, specifically designed to study the socioeconomic issues of pollution exposure. Using cadastral data, a random selection of housings was employed to guarantee the representativeness of study areas regarding demographic and socioeconomic characteristics. We collected data of 684 households (1589 individuals) in polluted areas and of 510 households (1198 individuals) in corresponding control areas, creating an original comparative dataset, the "Comparative Survey on Pollution Exposure" (CSPE). More precisely, the CSPE is representative of the following polluted areas: Viviez in France (156 households and 293 individuals); the municipality of Estarreja in Portugal (300 households and 739 individuals); and three villages of the Spanish Sierra Minera (Portman, Estrecho de San Ginès and Alumbres) located to the east of Cartagena (228 households and 557 individuals). The non-polluted control areas are as follows: Montbazens in France (138 households and 309 individuals); the municipality of Vagos in Portugal (200 households and 437 individuals); and a group of villages (Portus, Galifa, Perin, La Corona, Cantera, and Molinos Marfagones) located to the west of Cartagena in Spain (172 households and 452 individuals). Polluted areas are well-known hotspots of pollution, as confirmed by the literature in geochemistry and mineralogy [18,19,20]. Pollution is spread in the polluted municipalities but does not spill over to the control areas. Control areas were selected using region-specific literature. For example, Inácio, Neves and Pereira [21] and Guihard-Costa et al. [22] explain that Estarreja and Vagos had the same natural amenities before the installation of the chemical complex in Estarreja. Similarly, the French Institute of Public Health used Montbazens as control area to infer the health effects of pollution exposure in Viviez [23]. Polluted and control areas are sufficiently close so that we are able to suppose that residential choices can be made either in one area or the other. The polluted and control study areas are shown in Fig. 1.
It is important to note that this quantitative survey was originally conducted to complete a set of well-documented qualitative interviews. Although the results of the qualitative field campaigns are not directly included in this article, they greatly contributed to our understanding of the study context and issues.

Context of the Study Areas
A detailed description of the context of our study areas is given in Appendix 1. In a nutshell, our Southwestern European sample makes it possible to observe the three stages that characterize several polluted areas around the world: (i) ex-mining towns (Portman and ESG); (ii) a heavy metal industry undergoing technological reconversion (Viviez); and (iii) active (petro-)chemical complexes (Estarreja, and Alumbres). Table 4 in the Appendix 1 provides means of several important variables of our study in polluted and control areas as well as mean-comparison tests. Figures 5 and 6 in the Appendix 1 describe at more length the main reasons that motivate residents to live in the respective areas and their intentions to move out. Table 5 in the Appendix 1 Fig. 1 Mapping of polluted and control areas. Source: OSM, authors' computation provides mean-comparison tests for these reasons between polluted and control areas.

Methods
In this article, we study the determinants of two types of outcome indicators. First, we create a binary variable "living in a polluted area" which takes the value 1 for households living in a study area, and 0 for households living in a control area. This allows us to analyze the reasons why households live in each area. Next, we create a binary response variable that identifies households who plan to move out in the next five years. This allows us to explore potential dynamics in the environmental injustice process. In both cases, we estimate the probability of the outcome indicator being 1, using linear probability models. Referring to the theoretical pathways discussed in Sect. 2, our empirical analysis especially captures long-term effects between household socioeconomic indicators and pollution exposure. Indeed, the industrial sites we study have been settled since at least the 1950s (i.e., for more than two generations). In other words, while the oldest residents might remain for emotional reasons, several current residents already made their (re)location choice as a function of their economic constraints and their willingness to pay for a clean environment.

Descriptive Statistics
Let us first describe some of the features of our dataset. As shown in Fig. 2, the respondent's pollution perception of the area (measured with a 1-to-5 Likert scale) is strongly correlated with the probability of living in a polluted area. Hence, the question is why people continue to live in polluted areas, even if they know about related pollution issues. Figure 3 shows that the probability of living in a polluted area decreases when household incomes differ from the average community income. As observed by Guo and Bhat [24], the lower the absolute gap, the higher the risk of living in a polluted area.
However, Fig. 4 suggests that it is not mainly the poorest households (Q1) who live in polluted areas nor the richest households as expected (Q4 and Q5), but a lower-middle class (Q2 and Q3). 4 Nonetheless, this result should be interpreted with caution given that there might be endogeneity problems.  4 When endogenity problems are suspected (e.g., reverse causality and unobserved heterogeneity) and may bias classical estimates such as OLS or Probit, an IV strategy is commonly used in applied economics to perform unbiased fitted coefficients [28]. In our study context for instance, the presence of reverse causality and unobserved heterogeneity might overstate or understate the real impact of poverty on the probability of living in a polluted area. By introducing exogeneous variations strongly correlated to poverty status but uncorrelated to residential location (i.e., the instrument), IV models only capture the effect of poverty status on pollution exposure that transits through the instrument. If the selected instruments meet both conditions (i.e., being relevant and having no direct link with residential location), one can conclude that fitted coefficients are unbiased and a causal impact is observed. In other words, IV estimates will identify the effect of exogenous variations in poverty status (that transits through the instrument) on the probability of living in polluted areas. Appendix 1 give some additional information on the way pollution perception is distributed in the sample, namely that it increases with income (except for the highest quantile in polluted areas) and length of residence.

Empirical Challenges
Apart from pollution and industrial features, our sample of polluted and non-polluted areas may initially differ in terms of ecological and historical attractiveness. 5 In other words, it is impossible to be sure that our selected control areas are perfect counterfactuals of our selected polluted areas. A perfect counterfactual means that these control areas would evolve in the same way as the polluted areas if they had also benefited from the installation of an industrial site (or a mining company). Given that our control areas are potentially imperfect counterfactuals, simply comparing polluted and non-polluted samples might lead to a selection bias. Hence, the main challenge is to deal with endogeneity problems. First, our estimations could be biased because of the presence of reverse causality between household socioeconomic status and pollution exposure. Not only pollution reduces housing prices that potentially attracts poor households, but also polluting industries may emit residuals that are toxic for human health [25]. Daily exposure to these chemical residues may affect the capacity for socioeconomic advancement of residents through loss of productivity. Second, another source of endogeneity may originate from the omission of factors simultaneously correlated with household socioeconomic status and outcome indicators. In our context, we assume that heterogeneous environmental and geographical preferences may bias the estimates. Indeed, these preferences can be simultaneously correlated with socioeconomic status and the unexplained part of (re)location choice, and thus bias estimations. It is widely recognized that different social groups have specific perceptions and preferences about health, pollution and space and thus different ability to pay for desirable community amenities like a clean environment, natural spaces, nice landscapes, quality schools, public safety, employment accessibility, and accessible retail outlets [2]. To neutralize such a selection bias, we use an IV strategy that allows the effect of household socioeconomic status to be robustly assessed. 6 Xu and Sylwester [17] used a similar approach to assess the impact of area air pollution on emigration flows.
Another challenge is to correct the expected intra-group correlation within polluted villages that could reduce the variance of certain factors, and thus overestimate their significance. Indeed, it is well known that households tend to live among or relocate around groups of households with similar incomes [26]. Moreover, the presence (or absence) of public facilities is an important predictor of residential choice [24], which could reinforce the intra-correlation within an area. To control for the potential intra-group correlations within villages, cluster robust standard errors are systematically estimated at the village level (i.e., the standard errors are not calculated at the individual level but at the village level), see Wooldridge [27].

The Model
Based on Schirmer, Eggermond and Axhausen [26], we frame our estimation models on the following dimensions measured at the household level: socioeconomic (income, education, and wealth), demographic (age group proportions, gender proportion, marital status), and housing factors (housing size). In exploratory estimations, we also investigate the influence of community-based factors (length of residence and family network) and respondent-based factors (risk taking behavior). Table 6 in the Appendix 2 describes the explanatory variables that we tested. 7 More formally, we consider a structural equation with designing the functional form of the equation. In the first place, we use a linear probability model (see for example [28], p. 454). Hence, OLS regressions are performed for Eq. 1: (1) with y being the dependent variable "living in a polluted area" and the following types of explanatory variables: x 1 socio-economic factors, x 2 demographic factors, x 3 community factors, x 4 individual factors; i the corresponding estimated coefficients and the error term. Despite the comprehensive set of observed factors included in the analysis, these models potentially remain sensitive to endogeneity problems, mainly due to reverse causality and variations in unobserved individual preferences (landscape preference, geographical location, specific local amenities, place attachment, etc.). Mathematically, 1 is biased if socioeconomic factors are correlated with ε (i.e., the unexplained part of the variance of the dependent variable). Therefore, to establish a causal inference regarding socioeconomic factors, we apply an IV strategy based on two-stage least square (2SLS) estimations as follows Wooldridge [28]: with y the dependent variable, the explanatory variable and the error term as above, x 1 the estimated socio-economic factors, F the functional form of the estimation of socioeconomic factors, z the instruments, i the corresponding estimated coefficients, and the corresponding error term. 8 As suggested by Angrist and Pischke [29] , we only integrate exogenous control factors in the IV model in order to focus on the causal impacts of household education, wealth and income. Thus, we only control for demographic and housing heterogeneity across households (i.e., age group proportions, gender proportion, marital status, housing size, and country fixed effects). Indeed, the inclusion of potential endogenous control factors (e.g., owning a garden, length of residence, family network, and health risk behavior) could bias IV estimates insofar as we cannot be sure whether these factors are determinants or consequences of living in polluted areas. In the first step regression, we linearly regress socioeconomic factors on instruments and covariates. Then, the fitted values of socioeconomic indicators from the first-stage are included in the structural equation to neutralize the unobserved part of the variance of the dependent variable correlated with socioeconomic factors. In other words, assuming exogenous instruments, fitted values of socioeconomic status are suitably independent of omitted factors ( ). This means that the model no longer has endogeneity problems and produces consistent estimates. The case-control design of our database does not allow the use of locality-specific data as instruments such as meteorological variations (e.g., inadequate variation across the study areas). For this reason, we use the reported height (in meters) of the respondent as main IV to instrument socioeconomic factors. The literature abounds with works showing a strong relationship between individual height and socioeconomic status. Indeed, there is a vicious cycle between small height and poverty, namely due to the hazardous consumptions of mothers during pregnancy (e.g., smoking, and alcoholism), micronutrient deficiencies during childhood, schooling and labor market discrimination, and productivity loss due to lower cognitive skills [30,31]. In brief, not only poor households make smaller children but also smaller individuals have lower success in school and employment as well as lower earnings [32,33]. Even if the negative correlation between poverty and height is expectedly strong, which is the former condition for a valuable instrument, the exogeneity condition is always debatable. Unfortunately, there is no perfect instrument in observational data. Indeed, according to the health economics literature, a child's exposure to pollution might affect growth and then adult height, which may lead to bias IV estimates. For example, Currie and Neidell [34] observed significant impacts of air pollution peaks on infant mortality risks. In connection with our topic, during the English industrial revolution, Bailey, Hatton and Inwood [35] found a negative relationship between adult height and intense inhalation of coal smoke in young childhood. However, this study is only correlational and does not identify a causal effect. Epidemiological studies identified a critical window from maternal gestation to early childhood (around age 2) in which a child's physical and intellectual growth is highly dependent on environmental factors such as feeding practices and mother consumptions [36]. Moreover, Selevan, Kimmel and Mendola [37] reported that the health risk related to heavy metal absorption decreases significantly after age 6. Thus, one can assume that a chronic exposure to pollutants occurring up to a certain age may impair body development. Rosales-Rueda and Triyana [38] achieved in identifying a causal impact of maternal exposure to air pollution peaks on child's height-for-age some years later, but this effect is relatively small and only significant when such risky maternal inhalations occur during the gestation period. Hence, pollution is unlikely to affect the individual growth if the individual starts to be exposed after a certain age.
To check for the exogeneity of the respondent's height, we run IV estimates employing several sample restrictions: (i) including only respondents who are not born in the area; (ii) households in which the respondent was at least 4 years old when they moved in the area; (iii) households in which the respondent was at least 18 years old when they moved in the area. To our understanding, if the estimates remain significant in these restricted samples, it means that the effect of pollution exposure on height is mainly due to socioeconomic status. In contrast, if the estimates do not remain significant in the restricted samples, it means that reverse causality problems tend to overstate the effect of poverty on pollution exposure. Furthermore, we provide an additional test of exogeneity by comparing the correlations between the length of area residence and height in polluted areas and control areas. Such a correlation might cast doubt about the identifying assumption.
Next, we consider an additional instrument to implement over-identification tests: the respondent's parental education (a dummy variable identifying if at least one of both parents completed high school). While the correlation between the parent's and children's education is obvious, the exogeneity of parental education with residential location is reasonably questionable. For instance, Bayer, Keohane and Timmins [39] suggest that one's current location is highly dependent on parental location, the latter being highly correlated to the level of parental education. However, this concern is alleviated when we restrict the sample to individuals who were already 18 years old when they moved in the area. For them, there is no assumed correlation between parental location and current location. Regarding the over-identification test, if we accept the null hypothesis of no correlation between the instruments and the error term, it means that the latter instrument is valid, based on the assumption that the respondent' height is exogenous, or inversely.
Lastly, to identify the determinants that influence the decision to leave a polluted area, we regress the intention to move out in the next 5 years on several factors interacted with the fact of living in a polluted area versus living in a cleaner area. We consider that such a model specification (shown in Eq. 3) is not particularly affected by endogeneity-related biases. Indeed, an inverse causality bias is not prevalent since the intention to move out is unlikely to affect current income, education or wealth, except if households with move-out intentions work more in order to save money and cover the costs of relocation. While the latter assumption is unlikely to occur, to our extent, we find no reason to think that unobserved factors are simultaneously correlated with socioeconomic status and emigration intentions. In fact, by applying a similar IV strategy as in Eq. 2, we do not detect the presence of potential endogeneity problems. Hence, we only run linear probability estimations based on OLS for Eq. 3: where y being the dependent variable "intention to move out," and the following types of explanatory variables: x 1 socio-economic factors, x 2 demographic factors, x 3 community factors, x 4 individual factors, x 5 living in polluted areas; variables x 1 and x 5 are tested in interaction, as shown in the last parenthesis; i are the corresponding estimated coefficients and the error term. 9

OLS and 2SLS Estimates
To test the impact of households' socioeconomic status on the probability of living in polluted areas, we apply an IV strategy based on 2SLS estimations. 10 Table 1 depicts the results for three different indicators of household's socioeconomic status: the number of educated adult members in the household (at least completed a high school diploma), a 7-score wealth index and the logarithm of monthly household income corrected in purchasing power parities (PPP) based on 2017 US dollars. For each socioeconomic indicator, we run linear probability models using OLS, just-identified 2SLS and over-identified 2SLS estimators. In justidentified 2SLS estimates, the respondent's height is used as instrument, whereas in over-identified 2SLS estimates we add the parents' level of education as second instrument.
In Table 1, OLS estimates show very slight correlations between socioeconomic status and residential location. Except for education where the fitted coefficient is significant (column 1), wealth and income indicators have a (3) negative but non-significant relationship with the probability of living in polluted areas (columns 4 and 7). By instrumenting socioeconomic status by the respondent's height, just-identified IV estimates show significant and negative impacts of education, wealth, and income on the probability of living in polluted settings (columns 2, 5, and 8). For instance, column 2 exhibits that one extra-educated adult in the household reduces the probability of living in a polluted area by 46 percentage points (almost 6 times stronger than OLS estimates). Similarly, an increase of monthly incomes by 10% significantly decreases the risk of living in a polluted area by 6.5% (more than 10 times stronger than OLS estimates). Over-identified IV estimates show similar results, even if the instrumented coefficients are a bit lower (columns 3, 6, and 9).
The higher magnitudes of fitted coefficients from IV estimates suggest that OLS estimates strongly understate the impact of socioeconomic status on residential choices, probably due to a selection bias and unobserved heterogeneity. More specifically, the difference between OLS and IV results may be due to unobserved factors correlated with household socioeconomic status. Indeed, according to the sociological literature (e.g., [8]), polluted areas have several omitted characteristics that may attract lower-middle social classes to live there, such as employment opportunities and accessibility, housing facilities, and other advantages (e.g., community satisfaction and attachment). In accordance with this assumption, Table 1 shows that young and lower-middle aged people tend to live more often in polluted areas, either in a couple or as singles. Likewise, coefficients for housing size are significant and positive in all fitted models, corroborating the assumption that households with lower socioeconomic status can live in larger houses at affordable prices in polluted areas. Furthermore, Table 7 in the Appendix 2, examining the factors that are correlated with the probability of living in a polluted area, clearly confirms the disproportional attraction of young families with lower-middle socioeconomic status. We find a U-inverted relationship between the wealth index or income and the probability of living in polluted areas. 11 Moreover, housing size is an important predictor of this probability. 12 Other control factors are tested in 11 We measure the environmental health risk aversion by asking the respondent to evaluate on a 1-to-5 Likert scale his/her willingness to live in a polluted area that may decrease by 5 years old his/her life expectancy. 12 To be suitable, an instrument must meet two conditions: (i) it must be a non-weak predictor of the endogenous variable conditional on control variables; and (ii) it must not be directly related to the error component in the structural equation (i.e., not be correlated with the unexplained part of the probability of living in a polluted area). The second condition, called exclusion restriction assumption, means that our instruments should not directly correlate with the probability of living in a polluted area through channels other than the household socioeconomic status [28]. 9 The turning point is around 1918 $PPP per month and household, i.e., 1113 euros in Portugal, 1230 euros in Spain, and 1489 euros in France, corresponding to lower-middle incomes in each country. 10 An alternative model specification adding interaction terms between age groups proportions and housing size shows that housing size particularly increases the probability of living in polluted areas for the 30-45 age group (not shown). these exploratory estimates. We namely observe that community attachment, the presence of a family network and the length or residence in the area increase the probability of living near polluted sites, while risk aversion in the domain of health risks decreases the probability of living there. 13

Validation of the IV Strategy
Empirical tests suggest that our instruments satisfy the two requirements of suitable instruments, particularly for the respondent's height Wooldridge [28]. 14 First, both instruments are non-weak predictors of the endogenous variable, conditional on control variables. As shown in Table 8 in the Appendix 2, even after controlling for exogenous covariates, the height of the respondent and the education of his/her parents are both significant predictors of household socioeconomic status. It is reassuring to see in Table 1 that all first-stage F-statistics on the excluded instrument are relatively high.
Second, the exclusion restriction assumption can be partially tested when the endogenous regressor is overidentified. Over-identification tests suggest the absence of correlation between the error terms of the structural equation and the instruments (Table 1). Likewise, Figure 9 in the Appendix 2 clearly shows that a longer exposure to soil pollution does not affect height differently compared to control areas. In other words, residents from polluted areas are not smaller because of a longer exposure to pollution but because of other reasons such as poverty. In addition, we check the robustness of IV estimates by restricting the sample to the following: (i) households for which the respondent is not born in the area; (ii) households for which the respondent was at least 4 years old when he/she moved in the area; (iii) households for which the respondent was at least 18 years old when he/she moved in the area. Table 2 exhibits these IV estimates based on restricted samples. Interestingly, Table 2 reports lower effects than Table 1, suggesting that a potential reverse causality could slightly overstate the impact of poverty on pollution exposure. However, by neutralizing all suspected links between the instruments and pollution exposure, the negative impact of socioeconomic status on the probability of living in polluted areas remains strong and significant, even in just-identified 2SLS estimates where only the respondent's height is used as instrument. We observe in these bias-robust estimates that one extra-educated adult member decreases up to 40 percentage points the risk of living in polluted areas. Similarly, one extra owned asset and 60% additional income decrease this risk by around 30 percentage points. (1) In just-identified regressions, only height in meters of the respondent is used as instrument. In over-identified regressions, we add parental education level (at least a high-school degree) of the respondent as second instrument to perform over-identification tests (2) Robust standard errors are reported correcting intra-locality correlation. Significance levels are ***1%, **5%, *10% (3) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0. Then, this score is logtransformed by adding 1 for avoiding the generation of missing values (i.e., log(0)=.). We also tested the impact of this variable employing a log-transformation. However, the results remain the same 13 As Angrist and Pischke ( [29], p.157) argue, "if you can't see the causal relation of interest in the reduced form, it's probably not there". 14 Other control variables available in Table 10 of the Appendix 2 do not influence move-out intentions differently between polluted and non-polluted areas. For instance, the attractiveness of an area and the perception of social cohesion are two of such factors, probably because of a collinearity problem with residential location. In the same way, household education, marital status and the presence of a family network affect the intention to move out independently of the residential location. Anywhere, living close his/her family reduces move-out intentions, whereas being single and educated increases mobility intention, probably due to better social and professional opportunities elsewhere.
Furthermore, by running a reduced-form regression of the dependent variable on the instruments and covariates ( Table 9 in the Appendix 2), we are able to appreciate the expression of the causal effect of interest. Indeed, we find a significant correlation between our instruments and the probability of living in a polluted area, which is proportional to the effect of household socioeconomic status on residential location. This means that when household socioeconomic status is omitted, we continue to observe its influence on the dependent variable through the instruments. 15 Consequently, we can assume that our instruments are not only strong (i.e., meet the first requirement) but also exogenous (i.e., meet the second requirement). Table 3 reports OLS estimates regarding the main identified determinants of the intention to move-out in the next 5 years (1) In just-identified regressions, only height in meters of the respondent is used as instrument. In over-identified regressions, we add parental education level (at least a high-school degree) of the respondent as second instrument to perform over-identification tests

What Factors Motivate People to Move Out of Polluted Areas?
(2) Robust standard errors are reported correcting intra-locality correlation. Significance levels are ***1%, **5%, *10% (3) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0. Then, this score is logtransformed by adding 1 for avoiding the generation of missing values (i.e., log(0)=.). We also tested the impact of this variable employing a log-transformation. However, the results remain the same  15 One could also assume that this latter result reflects the presence of reverse causality, if individuals with move-out intentions become less engaged in community activities as they plan to cut ties soon. However, the fact that such a behavior is not observed in clean areas clearly invalidates this possibility.
Footnote 15 (continued) (the full table is available in Table 10 of the Appendix 2). 16 Several divergences appear between polluted areas and control areas. First, compared to control areas, living in polluted areas increases the intention to move out in the next 5 years by 3.4 percentage points (column 1). Focusing on interaction terms, we observe that both the wealth index and income groups affect the intention to leave a polluted area. Column 2 of Table 3 exhibits that one extra-owned asset increases the move-out intention by 2.2 percentage points in polluted areas. In contrast, Table 10 in the Appendix 2 shows that such an extra asset decreases this intention by 1.9 percentage points in control areas (i.e., a gap equal to 4.1 percentage points between both areas). Column 2 of Table 3 also hypothetically indicates that, among extremely deprived households (with not even one owned asset), living in a polluted area does not affect move-out intentions. In column 3, we find that, in polluted areas, households belonging to the third (Q3) and the fourth quintiles (Q4) have lower move-out intentions compared to households belonging to the richest income category (Q5), by around − 16 and − 14 percentage points, respectively. In control areas though, income is not correlated to this intention ( Table 10 in the Appendix 2). These results emphasize a non-linear relationship between household socioeconomic status and the intention to leave a polluted area. Regarding demographic factors, column 4 in Table 3 shows that the average age of adult household members significantly reduces the motivation to move out in households living in polluted areas (10 extra years in age reduces move-out intentions by 3 percentage points), which is consistent with the literature [40]. Community attachment might also explain why old people decide to continue living in the same place. Finally, we find a surprising result concerning community involvement. While participation in community Table 3 Main factors correlated to the intention to move out in the next 5 years (OLS estimates) Source: Authors' calculation from the CSPE database (1) Standard errors are robust to intra-group correlation. Significance levels are: ***1%, **5%, *10% (2) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0 events increases the intention to stay in a cleaner control areas by 5.7 percentage points ( Table 10 in the Appendix 2), the same participation increases the intention to leave a polluted area by 11.3 percentage points. These findings highlight the presence of a link between community involvement and sensitivity to pollution, as mentioned by Chanel et al. [4]. Moreover, these results might emphasize the level of social exclusion of several households who prefer to stay than to move out. 17

Discussion and Conclusion
Our study identified the main determinants that explain why households live and continue to live near polluted sites in Southwestern Europe. We implemented an original IV procedure based on the respondent's height, which is rarely done in the connected literature. Globally, our results corroborate the existence of socio-environmental inequalities in the European context. First, we find that household education strongly reduces the risk of living in polluted areas.
In line with the results from the US literature, environmental disamenities tend to ward off educated households, whereas environmental amenities attract these population groups [41]. Negative impacts of wealth and income are also highlighted, suggesting that polluted areas also represent an environment-poverty trap in European countries [42]. Interestingly, contrary to the USA, European specificities make ambiguous the correlation between socioeconomic status and pollution exposure. Indeed, our data distribution suggests that the social segregation is not as linear as in the USA. Additional exploratory analyses show that a lowermiddle income class disproportionally lives in polluted areas in Southwestern Europe. Note that housing size and place attachment significantly increases the probability of living in polluted areas for this social group. In addition, there are higher proportions of couples and young adults who live in polluted areas to benefit from bigger houses. Hence, as pertinently discussed by Flanquart, Hellequin and Vallet [8] , polluted areas may constitute an acceptable residential alternative for households with moderate standards of living insofar as such towns provide several amenities at an affordable price. The overrepresentation of a lower-middle class in larger houses also matches the assumption of Banzhaf, Ma and Timmins [2] about the existence of compensations and benefits for households who accept to live near a polluting industry. We conclude that such unobserved local amenities might be a source of endogeneity and are likely to contribute to understate the effect of socioeconomic status on residential location when classic estimators are used.
In terms of move-out intentions, we detect some non-linearities that might modify the linear vision of environmental inequalities. In accordance with the mainstream theory, it is particularly the richest households who plan to move out, probably because of their greater financial capacities and employment opportunities. Inversely, the most materially deprived households have no move-out intentions given their limited funding capacities and opportunities. Moreover, among polluted areas, we observe that people who are not involved in the community life have lower move-out intentions than people who are involved in community life. To us, these findings suggest the disproportionate presence of socially excluded groups in polluted areas with no moveout intentions. Our results emphasize the fact that the general equilibrium theorized by Tiebout [10] has still not been reached in our sample of polluted areas. However, between the two socioeconomic antipodes, there is a lower middle class with strong intentions to remain in polluted areas. As we previously suspected, one can assume that polluted areas in Southwestern Europe provide some amenities that particularly attract young families with moderate standards. Another interesting result underlines the importance of aging as significant determinant of the intention to remain in polluted areas. This result is in line with the sociological literature. In the US context, Shriver and Kennedy ( [43], p.495) argue that "long-term residents express less concern over environmental hazards because they are far more attached to their local communities". In France, Flanquart, Hellequin and Vallet [8] observe that populations living alongside hazardous industrial sites feel a strong community attachment.
The main limitation of this study is linked to the fact that it is measuring local effects. Indeed, strictly speaking, our results are only valid for our study areas in Southwestern Europe. However, the industrial heterogeneity of our sample makes our estimate generalizable to a wide spectrum of polluted areas. Of course, further analyses of the determinants of the socio-environmental segregation in Europe should be conducted in other contexts (e.g., different case studies and different outcome indicators).
The fact that hazardous polluted areas tend to become economically attractive for lower socioeconomic groups may be a dramatic public health issue, given that lower social classes tend to have more children. Additional studies should also assess the health and productivity effects of pollution exposure in addition to the influence of socioeconomic status on such effects. Finally, our results imply that health policies or recommendations for averting behavior should be targeted in particular toward lower-middle class families, which are the most likely to find an interest in employment and housing opportunities in polluted areas.

Data and Material
The database is anonymous and contains no personal information. All ethical standards concerning data collection and analysis were respected. Anonymized data is available upon request.

Appendix 1. The context of the study areas
The three study areas have different mining and industrial histories that make their comparison generalizable to a wide spectrum of pollution contexts. In the following, we describe some key characteristics of each area. Based on the CSPE database, Table 4 provides mean comparison tests between polluted sites and their respective control areas.
The Spanish Sierra Minera is an ex-mining site that was particularly active between 1957 and 1990 due to the activity of a multinational company. Soils in the area show high concentrations of zinc, lead and cadmium. Since the decline of mining, few industrial alternatives have been set up and the development of tourism remains uncertain [44,45]. By comparing Portman and Estrecho de San Ginès ESG to other towns located to the west of Cartagena (control group), Table 4 shows that lower housing prices, household incomes, and employment rates characterize such ex-mining sites. Table 4 also identifies the lower perceived availability of services and retail outlets in these areas.
Alumbres is a small town located at the foot of the Sierra Minera (between Cartagena and La Union). This small town has prospered alongside the gradual development of a large petrochemical complex since 1950. Today, this industrial site includes an oil refinery, a gas plant, an electric power station that transforms fuel oils and gas, a factory producing white minerals oils, natural sulfonates and sulfuric acid, a fertilizer industry, and a producer of lubrication bases. Alumbres is exposed to toxic winds of heavy metal residuals. As shown in Table 4 , there is no significant difference between Alumbres and Molinos Marfagones (control group) in terms of the price of housing, employment, unemployment, and perceived availability of retail outlets.
The Portuguese region of Estarreja has hosted an active industrial site since 1946. First, ammoniac, chlorine-sodium and PVC manufactures settled in Estarreja in 1946Estarreja in , 1956Estarreja in , and 1960, respectively. Then, since 1977, several petrochemical industries have begun their activity. Today, Estarreja hosts six complementary industries producing a large number of chemical products and other derived goods. Water cannels and ditches around the factory transport heavy metals and organic compounds. For instance, high concentrations of lead, mercury, arsenic, and benzene have been found in the area. On the other hand, the presence of the industrial complex has made the area more dynamic and has improved the average socioeconomic and demographic characteristics of the area [21,22]. As suggested by Table 4 , the municipality of Estarreja has better average characteristics than  The case of Viviez in France marks the transition between a zinc smelting and a modern industry based on the processing of zinc and the recycling of industrial wastes. In 1855, a zinc smelter settled in Viviez because  of its proximity to coal mines and rail facilities. In 1871, an international company undertook large-scale industrialization of the site by extracting, transforming, and exporting zinc. In 1922, the site became a pioneer in adopting electrolytic techniques to chemically extract zinc from zinc blende. Although zinc extraction ended in 1987, the company continues to process zinc. In addition, the company helped develop new industries, namely the recycling of cadmium residuals and plastics. As a result of the zinc melting activity, the soils in the areas are contaminated with high concentrations of lead, cadmium, and arsenic [23].
Considering the economic and community indicators listed in Table 4, Viviez looks more like Portman/ESG (an exmining site) than Alumbres and Estarreja (active industrial sites). Compared to Montbazens (control area), Viviez has lower housing prices, household incomes and employment rates, as well as a lower perceived availability of public services and retail outlets.
Additional descriptive statistics about the study areas are provided in Table 5. Table 5 shows that "professional reasons" are particularly mentioned as a motivation for living in Alumbres and Estarreja. Table 5 also shows that residents more often mention to have chosen to live in Viviez and Portman/ESG for "interesting economic reasons" (e.g., affordable housing price) compared to residents from non-polluted areas, who more often mention "environmental amenities" as residential motivation. This result is not surprising given the presence of an active industry supplying various jobs in both areas. Furthermore, strong community attachment and satisfaction are characteristics that are highlighted in several active industrial sites in Europe. However, perceived area attractiveness is significantly lower in these active sites (Viviez and Alumbres), compared to their respective control groups. This might be due to the presence of smokes, smells, and noises. Figure 5 summarizes the main reasons for living in the area: "social ties" play an important role for all study areas, even more than "professional reasons" and "economic reasons". Table 5 also shows stronger intentions to move out for residents from Viviez and Alumbres compared to their respective control groups. In these two municipalities, "environmental issues" appear as the most mentioned motivation for moving out. In contrast, "environmental issues" do not appear as a specific move-out issue in Portman/ESG and Estarreja, compared to their respective counterparts. On the other hand, area attractiveness perceptions in Portman/ ESG and Estarreja do not differ from their respective control areas. This absence of significant gap is not so surprising since, in both areas, there are close environmental amenities that residents can enjoy despite the presence of pollution (e.g., lagoon, sea, or ocean). In addition, pollution is not directly visible in Portman/ESG (i.e., mines closed since 1990) and can be relatively far in Estarreja for people living in other freguesias than Beduido & Veiros (where the chemical complex is located). In the same way, Table 5 does not show significant differences regarding move-out intentions in Portman/ESG and Estarreja, compared to their respective control group. Figure 6 summarizes main reasons for moving outed mentioned by the respondents in each area. Figures 7 and 8 show that pollution perception is always higher in polluted areas than in control areas. Figure 7 underlines that pollution perception is increasing with income except for the highest income quantile in polluted areas. Figure 8 underlines that pollution perception in polluted areas is growing with the length of residence. The scale varies from 1 "a very low willingness to live in a place that may reduce life expectancy by 5 years old" to 5 "a very high willingness to take this risk"

Appendix 2. Additional materials
Number of members who obtained a high-school grade Number of household members who obtained a high-school diploma Wealth index (0-to-7 score) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0  (1) Standard errors are robust to intra-group correlation. Significance levels are ***1%, **5%, *10% (2) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0.
(  (1) Standard errors are robust to intra-group correlation. Significance levels are ***1%, **5%, *10% (2) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0. Incomes are in US$ PPA  . 9 Length of residence and height in polluted and control areas   (1) Standard errors are robust to intra-group correlation. Significance levels are ***1%, **5%, *10%

Declarations
(2) Wealth index is the sum of the following owned (or not) assets: former house, second house, car, air conditioner, computer, cellphone, and financial assets. Thus, the wealthiest households have a score of 7 while the most deprived household a score of 0 (3) Area attractiveness index is the sum of the following 1-to-5 perception scores: global attractiveness, availability of public services and availability of shops and retails. Thus, areas perceived as the most attractive have a score equal to 15, while areas perceived as the most deprived have a score of 3