Patterns of MHC-dependent mate selection in humans and nonhuman primates: a meta-analysis

Genes of the major histocompatibility complex (MHC) in vertebrates are integral for effective adaptive immune response and are associated with sexual selection. Evidence from a range of vertebrates supports MHC-based preference for diverse and dissimilar mating partners, but evidence from human mate choice studies has been disparate and controversial. Methodologies and sampling peculiarities specific to human studies make it difficult to know whether wide discrepancies in results among human populations are real or artifact. To better understand what processes may affect MHC-mediated mate choice across humans and non-human primates we performed phylogenetically controlled meta-analyses using 58 effect sizes from 30 studies across 7 primate species. Primates showed a general trend favoring more MHC-diverse mates, which was statistically significant for humans. In contrast, there was no tendency for MHC-dissimilar mate choice, and for humans, we observed effect sizes indicating selection of both MHC-dissimilar and MHC-similar mates. Focusing on MHC-similar effect sizes only, we found evidence that preference for MHC-similarity was an artifact of population ethnic heterogeneity in observational studies but not among experimental studies with more control over socio-cultural biases. This suggests that human assortative mating biases may be responsible for some patterns of MHC-based mate choice. Additionally, the overall effect sizes of primate MHC-based mating preferences are relatively weak (Fisher’s Z correlation coefficient for dissimilarity Zr = 0.044, diversity Zr = 0.153), calling for careful sampling design in future studies. Overall, our results indicate that preference for more MHC diverse mates is significant for humans and likely conserved across primates.


Introduction
The major histocompatibility complex (MHC), an extraordinarily polymorphic and ancient chromosomal region shared by virtually all vertebrates, is the most likely candidate for "good genes"-genes with fitness benefits-due to its involvement in both immune defense and mate choice (Potts & Wakeland 1990;Hedrick 1994;Bernatchez & Landry 2003;Milinski 2006). The MHC encodes molecules that bind specific self-and pathogen-derived peptides and present these to T lymphocytes, thus initiating appropriate immune activation (Hughes & Yeager 1998). Class I molecules mainly bind intracellular pathogen peptides (e.g., viruses and bacteria) and are expressed by all nucleated cells whereas class II molecules mainly bind extracellular parasite peptides (e.g., helminths, ectoparasites) and are expressed by professional immune antigen-presenting cells, such as mononuclear phagocytes or T-cells (Knapp 2005).
Infectious agents are thought to be the strongest selective force shaping human evolutionary history (McMichael 2001;Fumagalli et al. 2011;Karlsson et al. 2014) and continue to have strong effects on fitness. For example, humans are currently known to be infected by over 1,400 parasitic species (Taylor et al. 2001) and in 2010 parasites were responsible for nearly 64% of global deaths in children younger than 5 years (Liu et al. 2012). Parasite-mediated balancing selection in humans is identifiable at both broad and fine scales. At the population level parasite-mediated balancing selection is indicated by spatial patterns of MHC polymorphism increasing with virus species richness (Prugnolle et al. 2005) and greater frequencies of protective MHC alleles in areas with greater parasite risk (Hill et al. 1991).
Selection at the individual level is indicated by resistance of rare MHC genotypes to specific strains of

Accepted Article
This article is protected by copyright. All rights reserved.
While it is clear that population MHC allelic diversity is associated with pathogen diversity and prevalence, there is also theoretical and empirical support for the role of sexual selection in maintaining heterozygosity and allelic variation across the MHC (Potts et al. 1991;Hedrick 1992;Potts et al. 1994;Jordan & Bruford 1998;Penn & Potts 1999;Winternitz et al. 2013;Ejsmond et al. 2014). Proximate mechanisms enabling MHC-mediated mate choice include odor and visual cues of MHC composition. Animals can discriminate between the different volatile peptides bound by products of specific alleles that contribute to body odors (Boyse et al. 1983;Potts et al. 1994;Carroll et al. 2002;Penn 2002;Leinders-Zufall et al. 2004;Milinski et al. 2005;Milinski et al. 2013). Additionally, visual cues such as the expression of sexually dimorphic conspicuous traits, or even condition-related behavior, can be indicators of immune genotype (Hamilton & Zuk 1982;Folstad & Karter 1992).
Ultimate mechanisms for MHC-mediated mate choice can take three non-mutually exclusive forms (Piertney & Oliver 2006). (1) Preferences for MHC diversity (measured in terms of heterozygosity, or the number of MHC alleles) in mates could provide direct fitness benefits (e.g., healthier mates are better providers and less infectious to their mates and offspring) and/or indirect benefits if rare alleles, more likely to be carried by heterozygotes or by individuals harboring many alleles, can be passed on to offspring (e.g., "good-genes-as-heterozygosity (Brown 1997(Brown , 1999), although this remains to be demonstrated empiracally. Thus, preferences for MHC diversity in mates could also increase offspring levels of heterozygosity in structured, finite populations (Fromhage et al. 2009). (2) Preferences for specific MHC genotypes in potential mates could provide the direct benefits mentioned previously and also indirect benefits if genes that are protective against contemporary parasites can be passed to offspring. Protective alleles are thought to be more rare in the population (Slade & McCallum 1992) and so more likely to be carried by heterozygotes. Both preferences for heterozygosity and preferences for specific resistance genotypes can broadly fit under the category of preferences for MHC diversity for our current study. (3) Preferences for MHC dissimilarity/complementarity in mates would provide indirect

Accepted Article
This article is protected by copyright. All rights reserved. fitness benefits by increasing immunodiversity of offspring (Tregenza & Wedell 2000). Alternatively, preferences for MHC-dissimilar mates may be independent of immunogenetic benefits for offspring if MHC alleles serve as markers of relatedness in mates. This would provide indirect fitness benefits by avoiding consequences of close inbreeding (Yamazaki et al. 1988). If preference for dissimilar mates is for immunogenetic indirect benefits, then the extent of dissimilarity of the potential mate may matter. The optimality hypothesis proposes that offspring may benefit most from optimal rather than maximal immunodiversity (Wegner et al. 2003b;Milinski 2006), because having too many different alleles could actually deplete autoreactive T-cells that are required for immune response (Nowak et al. 1992;Woelfing et al. 2009). In this case, the degree and composition of optimal MHC diversity for offspring is expected to primarily depend on the diversity of parasites in the environment (Wegner et al. 2003b;Milinski 2006;Eizaguirre et al. 2009).
Instead of mediating mate choice preferences, the MHC may be incidentally associated with signals of overall condition and individual vigor that are dependent on genome-wide heterozygosity (Brown 1997). Similarly, MHC allele frequencies that vary by population may simply correlate with phenotypic cues of genetic relatedness. The absence of a correlation between MHC and neutral variation would support MHC-mediated mate choice, though the presence of a correlation would not necessarily preclude the MHC's role.
A recent meta-analysis found broad evidence of MHC-associated mate choice in non-human species, with stronger support for diversity preferences than for dissimilarity preferences (Kamiya et al. 2014), but results from human studies have been more equivocal and contentious (reviewed in Havlicek & Roberts 2009;Winternitz & Abbate 2015). This is in large part due to the greater variation inherent in human research because many potentially confounding aspects of the study design are harder to control.
The most problematic issue is likely hidden, yet substantial admixture between populations that can result in spurious assortative or disassortative genetic patterns in pairing (Redden & Allison 2006;Solberg et al. 2008;Havlicek & Roberts 2009). These patterns of genetic similarity/dissimilarity between partners can arise because autosomal and MHC genetic variation is structured by ethnicity (Rosenberg et al. 2002;

Accepted Article
This article is protected by copyright. All rights reserved. Vina et al. 2012) where the population frequencies of MHC alleles depends on the geographical location and on the level of population heterogeneity (Prugnolle et al. 2005;Solberg et al. 2008;Vina et al. 2012).
A second confounding variable unique to humans is technological alterations of biological phenotypes, which include hygiene and makeup routines, surgery, and artificial hormones for controlling contraception (Wedekind et al. 1995). However, perfumes appear to be chosen to amplify one's MHC odor profile (Milinski & Wedekind 2001). Lastly, it may sometimes be difficult to standardize experimental designs between studies when using human subjects and these methodological differences can also confound results (Havlicek & Roberts 2009). Several recent reviews have discussed these issues in attempts to reconcile significant and non-significant results from over three decades of human MHC mate choice research (Havlicek & Roberts 2009;Winternitz & Abbate 2015). However, only quantitative assessment that explores the magnitude and precision of effects can reveal the ultimate biological importance of phenomena (Nakagawa & Cuthill 2007). Thus, there is a need for a quantitative comparison of human MHC studies. For evolutionary context, non-human primates should be relatively free from the confounding biological and technological aspects of human studies and thus should more closely resemble human evolutionary origins than contemporary human populations.
To determine the biological importance of MHC-based mating across human populations and uncover drivers of mate selection, we performed a phylogenetically controlled meta-analysis of published studies. We tested for biologically significant mate choice for MHC-dissimilarity and MHC-diversity separately. We aimed to identify consistent relationships between MHC and mating across primates, including non-human primates, to put human results in context. In addition, we tried to disentangle potential biological or methodological sources of variation in the observed effect sizes to better understand differences between past studies and minimize differences for future studies.

Accepted Article
This article is protected by copyright. All rights reserved.

Literature Search
Dataset compilation methods are described in detail in Winternitz and Abbate (2015). Briefly, studies were compiled from the reviews by Havlicek and Roberts (2009) and Setchell and Huchard (2010), from the meta-analysis by Kamiya et al. (2014), and additional studies were identified up from 2009 to January 2015 via Web of Science using the topic "MHC" and "Major Histocompatibility Complex" and "mate choice" or "mate selection" or "mate preference" and searching within results for "human" and "primate".
Studies were listed as testing for human or primate preferences, and for preferences for MHC dissimilarity or diversity/heterozygosity. Studies were included if MHC genotypes (or their approximations via single nucleotide polymorphisms (SNPs) e.g., HapMap data) were obtained for the individuals tested. Studies were excluded if we could not extract the full set of effect sizes that related to the question of MHC influence on mating preferences (e.g., Giphart & D'Amaro (1983) did not provide test statistics for pairwise tests). We only considered classical MHC Class I and Class II genes since nonclassical MHC genes, although also involved in immunity, usually have tissue-specific expression and much less diversity indicating different selection schemes (reviewed in Rodgers & Cook 2005). For example, Khankhanian et al. (2010) provided data for non-classical HLA-E, which we did not use.
Similarly,  only presented data for non-classical HLA-L and HLA-J genes. Lists of full references, including references we could not use and explanations for exclusion, are provided in the supplementary materials.

Data extraction and effect size calculation
We chose r effect size (correlation coefficients) as the measure of the association between MHC-target (dissimilarity or diversity) and the strength of mating preference/outcome. Studies had mostly measured dissimilarity as categories of allele-sharing (e.g., none and =>1) and occasionally as allele sequence divergence. Diversity was mostly measured categorically (i.e., homozygous at one or more loci versus

Accepted Article
This article is protected by copyright. All rights reserved. heterozygous) and occasionally continuously (mean heterozygosity over all loci considered). Test statistics other than measures of correlation were converted to r effect sizes following (Nakagawa & Cuthill 2007); while there are limitations to converting r from summary statistics other than bivariate correlations (Aloe 2015), we stress that these limitations will introduce noise and not bias. Results from Yang et al. (2014) could not be directly converted to effect sizes so we calculated values for the data points (points were extracted from Figure 5 using software Datathief (Tummers 2006)) and fit correlation models to obtain effect size estimates in the desired scale. When studies provided multiple effect sizes that we could not independently evaluate with moderator variables (e.g., results from multiple loci or MHC allele and supertype data), we calculated weighted means by first converting measures to r and then weighting them by the underlying sample sizes. We accounted for non-independence of multiple effect sizes extracted from the same study that remain after calculating weighted means by accounting for the specific hierarchical structure of the data in the appropriate statistical models (see the structure of the mixed model in the next section). For studies that listed effect sizes for MHC-similarity preferences, we reversed the sign (i.e., resulting in a negative association for dissimilarity) to align data for preference for dissimilar mates according to the predictions of our focal biological hypothesis. The number of raters was recorded to test for potential effects of sample size on the resulting effect size. The number of individuals rated (number of independent repeats) in the study was recorded to calculate the variance in effect size (variance = 1/(N study rated -3)). When weighted effect size means were calculated, we also recorded the mean number of individuals rated, and used this estimate to calculate the variance of the weighted mean.
Raw data and converted effect sizes were checked by independent extraction (JA) and any inconsistency was discussed until a consensus was reached (between JW, JA, and LG). We converted effect sizes into Fisher's Z (Zr) to stabilize variance across effect sizes, and Zr and its variance (defined above) were used for meta-analyses. The full dataset, comprising 58 effect sizes from 30 studies across 7 species, and effect size extractions and conversions are provided in the electronic supplementary material. The dataset was split to test for MHC dissimilarity and diversity-mediated mating preferences across primates (N=41 and

Accepted Article
This article is protected by copyright. All rights reserved. 17, respectively). Human studies greatly outnumber the other species, which may lead to biases in the observed effect size patterns. Therefore, we also analyze them separately from non-human primates.
Biological and methodological differences between studies have been shown to predict variation in MHC-mediated mating patterns in human and non-human populations (Havlicek & Roberts 2009;Setchell & Huchard 2010;Kamiya et al. 2014). We accounted for these potential sources of heterogeneity by considering various moderator variables for the partition of the between-study variance in effect sizes.
The following data were extracted from each study as methodological predictors: (1) study ID and (2) year of publication for publication bias testing, (3) choice cue used for mating preference (i.e., facial attractiveness, odor attractiveness, or mate choice outcome) (4) the number of individual raters (N of rater), (5) multi-locus or single locus, (6) level of population heterogeneity (dichotomously classified as ethnically homogeneous or heterogeneous). This moderator was included to control for artefactual patterns of MHC similarity or dissimilarity that can arise from the pooling of different ethnic groups together in the same sample (Rosenberg et al. 1983). Populations were classified as 'ethnically homogeneous' if the study samples fell into single ethnic groups according to the ethnic categories defined by the CIAfactbook (https://www.cia.gov/library/publications/the-world-factbook/index.html), or based on detailed genealogical (i.e., Ober et al. 1997) or anthropological records (i.e., Hedrick & Black 1997). All other human populations were classified as 'ethnically heterogeneous'. All non-human primate populations were considered 'homogeneous' as they most likely represented single isolated populations (Setchell & Huchard 2010). (7) Contraceptive pill use can potentially reverse previous preferences (Wedekind et al. 1995) so we ran all models excluding pill-use effect sizes (N = 4) but results remained essentially unchanged when all effect sizes were used (both sets of results provided in all tables). We included pill use as a moderator for human odor preference studies (female-pill users, female non-pill users). Biological predictors included (8) species, (9) choosy sex (i.e., the unit of investigation: males, females, or pairs),(10) MHC class (Class I, Class II, or both), and (11) relative testes size as a proxy for mating system (Harcourt et al. 1981;Harcourt et al. 1995;Dixson & Anderson 2004; Figure S1). Mating

Accepted Article
This article is protected by copyright. All rights reserved. system strongly impacts individual mating strategies and population genetic structure (Sugg et al. 1996) and so could influence the expression of MHC-mediated mate choice (Setchell & Huchard 2010).
Unfortunately, background genetic measures of dissimilarity and diversity were not available for the majority of studies in our dataset, thus we could not include this potentially important moderator in our analyses. However, we were able to extract a limited number of effect sizes based on neutral markers, which were found to show positive but non-significant correlations with MHC-based effect sizes (MHC dissimilarity correlation (95% highest posterior density, HPD) = 0.164 (-0.421 to 0.747), N=11; MHC diversity correlation (95% HPD) = 0.110 (-0.708 to 0.942), N=5, Figure S11. See Supplementary Text for analysis details). This suggests that MHC-mediated effects were largely independent of genome-wide effects.

Meta-analytic procedures
There were three causes of non-independence in our datasets: 1) more than one effect size was extracted from a study, 2) multiple effect sizes were available for the same species, and 3) species share evolutionary history making effect sizes confounded by the phylogeny of species. We used phylogenetic mixed-effects modeling that includes random effects to account for non-independence caused by study-, species-, and phylogeny-specific effects (Hadfield 2010;Hadfield & Nakagawa 2010;Nakagawa & Santos 2012). A phylogenetic tree was obtained by trimming the primate 10KTree (Arnold et al. 2010) using the drop.tip function from the ape package v3.4 (Paradis et al. 2004). The Deviance Information Criterion (DIC) was computed for all models considering different random effect structure (Table 1), and top model selection was based on DIC values, where the lowest DIC is considered the best, models within 2 DIC units are considered equivalent, and a change in DIC of 4 or more significantly improves prediction (Spiegelhalter et al. 2002). We calculated phylogenetic heritability or phylogenetic signal, H 2 , as the proportion of total variance in Zr that can be explained by phylogenetic variance (Hadfield &

Accepted Article
This article is protected by copyright. All rights reserved. Nakagawa 2010;de Villemereuil & Nakagawa 2014), equivalent to Pagel's λ (Pagel 1999). Our datasets of MHC-dissimilarity and MHC-diversity were limited (Dissimilarity N = 41; Diversity N = 17) and were comprised of 28 and 11 studies, respectively, and 7 species each. Preliminary exploratory analysis of the data revealed that the factors study ID and species were strongly associated, with each study focusing on a single species. We compared models for all combinations of the random factors: study ID, species, and phylogeny. We found that including study ID greatly improved model fit, but models adding species, phylogeny, or both were all within 1 deviance information criterion (DIC) from the top model (Table 1) and essentially equivalent in terms of prediction (Spiegelhalter et al. 2002). Therefore, to avoid potentially overfitting the models, we chose to include only study ID and phylogeny to control for phylogenetic pseudoreplication in multi-species models, and to include study ID in human-only models.

Meta-analyses were conducted with generalized linear mixed-effect models with Markov Chain
Monte Carlo techniques using the R package MCMCglmm (Hadfield 2010). We present details on MCMCglmm model specification and diagnostics in the supplementary material. Briefly, all models were fit using an uninformative inverse gamma prior for all random effects and residuals, and were checked for sensitivity to prior specification and convergence across independent model runs (following Wilson et al. 2010). Each model was run for 3 million iterations, sampling every 500 after discarding one-million, and this process was repeated for each model to confirm stability of results. We first ran intercept-only mixedmodels (with random effects) to determine the mean effect size across all studies and for humans and non-human primates separately. We tested if specifying priors based on the effect size estimates from mammalian MHC mate choice (Kamiya et al. 2014) would improve model fit and reduce variance around posterior estimates (Garamszegi et al. 2009), but results were essentially identical to those obtained with uninformative priors (Table S1, S2, S6).

Heterogeneity estimation
Variation in observed effect sizes between studies is composed of both real differences in mating outcome (effect size heterogeneity) as well as random error. To estimate effect size heterogeneity we used I 2

Accepted Article
This article is protected by copyright. All rights reserved. (Higgins & Thompson 2002;Higgins et al. 2003) modified for multilevel meta-analytic models (Nakagawa & Santos 2012). Low, moderate and high heterogeneities refer to I 2 of 25%, 50% and 75%, respectively (Higgins et al. 2003). The intercept-only models indicated high heterogeneity (>75%) in effect sizes, and while this was mostly explained by the random effects of study ID and phylogeny, substantial residual heterogeneity persisted (26% for dissimilarity, 14% for diversity, Table 2).
We next constructed a series of meta-regression models to identify the most important moderators (listed above) that explained substantial residual heterogeneity in effect sizes (Nakagawa & Santos 2012).
We conducted univariate fixed-effect mixed models to estimate the mean effect size for each moderator separately (we avoided complex models with multiple predictors given the limited sample size). Models with categorical moderators were run without the intercept to test each trait against no effect. Parameter estimates were based on posterior means and estimates with highest posterior density (HPD) intervals that do not cross zero are inferred to represent real effects. All effect sizes are reported as Fisher's normalized correlation coefficients (Zr) with 95% confidence intervals. In ecological literature, r ≈ 0.1 (Zr ≈ 0.10) is generally considered a small effect, r ≈ 0.3 (Zr ≈ 0.31) a medium effect and r ≈ 0.5 (Zr ≈ 0.55) a strong effect (Cohen 1988;Møller & Jennions 2002).

Publication bais and power analysis
We tested for publication bias by using four different approaches given that they have different advantages and disadvantages. First, we applied Egger's regression (Egger et al. 1997) on meta-analytic residuals instead of effect sizes, which can better distinguish between publication-bias and other sources of heterogeneity (Egger et al. 1997;Sutton et al. 2011;Kamiya et al. 2014). If the regression of the standard normal deviate (residuals divided by the standard error) on precision has an intercept different from zero at 90% confidence, then there is evidence of bias favoring publication of less precise yet significant results (Egger et al. 1997). Second, we tested for temporal bias in publication results (e.g., if non-significant studies are suppressed immediately after the first significant publication) by including the publication year of the study as a moderator in the meta-analytic model. We also used Spearman's rank to

Accepted Article
This article is protected by copyright. All rights reserved. test for significant correlations between effect size and year of publication. Third, to assess the impact of publication bias and test the robustness of our results, we used the nonparametric trim and fill method (Duval & Tweedie 2000b, a) in the metafor R package (Viechtbauer 2010). This method adjusts the mixed-model intercept for potentially missing studies and the difference is added to the original metaanalysis model intercept (and credible interval) (following Sutton et al. 2011). Fourth and finally, as bias for publications with significant results can rely more on the p-value than on the effect size, we used the p-curve method to test if the distribution of significant p-values, the 'p-curve', indicates that our studies have evidential value and are free from "p-hacking" (Simonsohn et al. 2014a, b). While problems in identifying publication bias using the p-curve method have been identified (Bishop & Thompson 2016; Bruns & Ioannidis 2016), we controlled for false negative results by ensuring that all data entered into analysis met the three required assumptions set by Simonsohn et al. (2014). Specifically, these assumptions are that p-values are (1) associated with the hypothesis of interest, (2) statistically independent from other selected p-values, and (3) distributed uniformly under the null hypothesis of no bias.
We tested the robustness of our results by conducting retrospective power analyses to evaluate whether our sample size (number of effect sizes) was sufficient to have a high chance of detecting a biologically significant effect. We used a pre-specified effect size of 0.15 (explaining 2.2% of the variation in mating patterns) which fell within the observed range of effect size estimates from the metaanalysis of Kamiya et al. (2014) that investigated MHC-mating patterns across vertebrates (Dissimilarity Zr (HPD) = 0.064 (-0.080 to 0.193); Diversity Zr = 0.113 (-0.004 to 0.237)). We used this pre-specified effect size to represent the minimal biologically significant effect and we used the observed mean variance of the effect sizes following recommendations of Thomas (1997). We conducted our power analyses for meta-analytic random-effects models for low, medium, and high levels of heterogeneity using the methods of Hedges & Pigott (2001).
All statistical analyses except for the p-curve method (implemented at http://www.p-curve.com/) were carried out in the R environment (v.3.2.1) (R Core Team 2015) and all R code is provided in the

Results
The main results are presented in the next four sections and detailed results of our meta-regression analysis can be found in the supplementary materials (Tables S1-S17, Figures S1-S11).

Preference for MHC-dissimilarity
The mean effect size calculated over all studies (excluding contraceptive pill-users) indicated no significant correlation between MHC-dissimilarity and mating outcome (intercept-only posterior mean Zr (95% HPD) = 0.044 (-0.174 to 0.289), N = 37). The total heterogeneity (Total I 2 ) in effect sizes was large (89%) and could mostly be explained by the two random factors (I 2 ID = 37%, I 2 phylogeny = 27%), with substantial residual variance remaining (I 2 residual = 26%), shown in Table 2. We ran univariate models to identify moderators potentially explaining residual heterogeneity and while no moderator was identified as significant (Table S1), the moderator "choosy sex" showed significant contrasts between the categories of males and pairs (contrast p = 0.02, Figure 1a, Table S11). In other words, studies using mated pairs had greater effect sizes for MHC similar mates than studies investigating male preferences. This effect was driven by human studies, which had significant contrasts not found for non-human primates (human contrast p = 0.031, Table S11). Phylogenetic heritability in MHC-dissimilar mating patterns was low (mean H 2 = 0.29 (0.004 to 0.77), mode H 2 = 0.04, Figure 2a, Table 2) and we note that random effects are bound to be positive and their posterior distributions will never overlap zero (Wilson et al. 2010). Thus, the meaningfulness of the random effect of phylogeny cannot be based on its non-zero posterior distribution. Despite its effect of increasing the HPD intervals for model estimates, we retained phylogeny as a random effect to control for pseudoreplication in multispecies models. We investigated the impact of phylogeny on the meta-mean effect size of MHC-dissimilarity by comparing model DICs and posterior

Accepted Article
This article is protected by copyright. All rights reserved.
estimates for models with and without phylogeny, but results were qualitatively similar and nonsignificant (Table S13).
When human and non-human primate studies were examined separately, neither had significant associations between MHC-dissimilarity and mating outcome (human Zr = -0.022 (-0.107 to 0.073), N=31, non-human primate Zr = 0.109 (-0.194 to 0.404), N=6). Mean effect sizes calculated for these two taxonomic groups were not statistically differentiable. In intercept-only models, total heterogeneity was high for both humans and non-human primates (I 2 total = 88% and 70%, respectively) with substantial residual variance in humans specifically (I 2 residual = 36% versus 20% in primates, Table 2). Moderators tested separately in univariate models for humans, in models by human choice cue, and in models for primates were all non-significant and did not reduce residual heterogeneity (Table S2-S4, S9, Figure 1a, Figure 3), but the direction of effect sizes for primates was consistent for dissimilarity ( Figure 1a, Figure   2a). Phylogenetic heritability in MHC-dissimilar mating patterns among non-human primates was low but present (mean H 2 = 0.41 (0.003 to 0.92), mode H 2 = 0.09, Table 2).

Preference for MHC-similarity
As a post-hoc investigation to explain the large residual heterogeneity in effect sizes among the human dissimilarity dataset (I 2 residual = 36%, Figure 4a), we specifically tested for the effects of population ethnic heterogeneity on patterns of MHC-assortative mating among experimental and observational studies (predicted by Rosenberg et al. 1983 andHavlicek &Roberts 2009). We ran mixed-effect models for the subset of dissimilarity effect sizes that were negative (indicating preferences for similarity). We found that preference for MHC-similarity was significant in ethnically heterogeneous population samples, but not among those that were homogeneous. This result was found only in observational (mate choice) studies, and not among those using experimental approaches (odor and facial preference combined) ( Figure 5, Table S5).

Accepted Article
This article is protected by copyright. All rights reserved.

Preference for MHC-diversity
For diversity models, we present results for effect sizes (N=17) including one from contraceptive pillusing women because pill use was never shown to have an effect on preferences for MHC diverse mates, its exclusion did not reduce heterogeneity, and model results are virtually identical whether this effect size was excluded or not (both results presented in all tables for comparison). Across all effect sizes combining humans and primates there was a non-significant trend for MHC-diversity to be positively associated with mating outcome (posterior mean Zr (HPD) = 0.128 (-0.064 to 0.373), N=17; Figure 4b shows significant raw mean Zr effect sizes and results from random effect models without accounting for study ID and phylogeny). Total heterogeneity was large (I 2 total = 64%) but was mostly accounted for by differences between studies and phylogeny (I 2 ID = 16%, I 2 phylogeny = 34%, Table 2). Residual heterogeneity was low (I 2 residual = 14%) and could only slightly be reduced by the addition of the moderator relative testes size as a fixed effect (Table S10). Univariate models of moderator effects were non-significant (Table S6). The model including relative testes size was also non-significant, but the negative association between relative testes size and preferences for MHC diversity was trending toward biological significance (intercept (HPD) = 0.246 (-0.019 to 0.581); posterior mean testes (HPD) = -0.218 (-0.479 to 0.047), Table S6). Phylogenetic heritability in strength of MHC-diversity mating patterns was moderate (mean H 2 = 0.50 (0.04 to 0.96), mode H 2 = 0.16, Table 2, Figure 2b).
Examined separately, humans showed a significant association for more MHC-diverse individuals to be preferred as mates (humans = 0.153 (0.020 to 0.283), N = 10, Figure 1b) while nonhuman primates had a non-significant trend for mate choice for diversity (primates = 0.110 (-0.207 to 0.456), N=7, Figure 1b). Total heterogeneity was moderate for humans (I 2 total = 45%) and high for nonhuman primates (I 2 total = 79%), and mostly explained for primates by the random effects of study ID and phylogeny (I 2 ID = 26%, I 2 phylogeny = 36%, Table 2). Residual variance was not substantially reduced after the addition of moderators as fixed effects for humans (Table S10) and these mixed models were essentially equivalent to the intercept-only model (all ΔDIC < 2). The addition of the moderator relative testes size slightly reduced residual heterogeneity and DIC for non-human primates compared to the

Accepted Article
This article is protected by copyright. All rights reserved. intercept-only model (Table S10), but not substantially (ΔDIC < 2). When examining categorical level differences in MHC-diversity effect sizes using univariate models, we found no significant moderator for non-human primates (Table S8). In contrast, humans showed stronger preferences for MHC-diverse mates for the categories of choosy sex (female rater) and MHC Class (when both classes were investigated together, using multiple loci) (Table S7, Figure 1b).
Raw Zr effect sizes for all primates were positively associated with preferences for MHC diversity and showed significant means when not accounting for study ID and phylogenetic pseudoreplication ( Figure 4b). All our mixed models including phylogeny as a random effect had wide 95% confidence intervals for the phylogenetic heritability of effect sizes (proportion of total variance explained by phylogenetic variance, see Table 2). Thus, to test the stability of our results, we reran mixedmodels for combined datasets and for non-human primate datasets with the alternative random effects ID + species, and ID + species + phylogeny and found qualitatively similar results (Table S13), indicating our conclusions are robust to the random effect structure employed.

Publication bias and power analysis
We found no evidence for publication bias in the datasets. Egger's Regression tests indicated the intercepts for MHC-dissimilarity and diversity datasets were not significant at 90% confidence intervals (Table S14, Figure S2 and S3). The slope for MHC dissimilarity was significantly negative, indicating that studies with larger sample sizes (and less variance) showed stronger preference for MHC similarity (Table S14, Figure S2a). This effect was largely due to human correlative studies that had large sample sizes but did not control for ethnicity and assortative mating biases. Trim and Fill analyses on mixedmodel residuals (following Nakagawa & Santos 2012) were non-significant and suggested there were no missing studies on the left-hand side of the funnel plots (Table S14, Figures S2-3). The sensitivity adjustment of 0.002 to the original intercept-only mean for human MHC-diversity results would increase the significance (adjusted meta-mean (95% HPD) = 0.155 (0.022 to 0.285)). We found no evidence for temporal bias using year as a moderator, nor using Spearman's rank correlation between effect size and

Accepted Article
This article is protected by copyright. All rights reserved.
year of publication (all p>0.05, Table S16). Finally, P-curve analyses for MHC-dissimilarity, similarity, and diversity datasets suggests that studies contained significant evidential value and showed no evidence of intense p-hacking (Table S16). However, study sets in general had low likelihoods to detect a true effect (average power = 24% for dissimilarity, 57% for similarity, 33% for diversity).
Power analyses on our meta-analytic results indicated that the combined primate and human datasets for MHC dissimilarity and diversity had high power (77.2% -98.5%) to detect a biologically significant mean effect of 0.15 at an alpha value of 0.05 across low, medium, and high heterogeneity values (Table S17). Power was lower for non-human primate datasets (dissimilarity = 43.8%; diversity = 58.2% power at high heterogeneity) and a total of 17 and 13 effect sizes, respectively, would be required to reach 80% power for an overall effect size of 0.15 at alpha = 0.05 at high heterogeneity (Table S17).
Splitting the human dataset by experimental choice cue category, we should have high power to detect a biologically meaningful effect size, if it were present, for mate choice and facial preference sample sizes even at high heterogeneity (98.8 and 64.2% power, respectively). This indicates our non-significant metaanalytic results for human mate choice and facial preference category were likely not due to sample size limitations. In contrast, given the observed variation in effect sizes, we would need 84 odor preference effect sizes to detect biological significance at alpha = 0.05 and high heterogeneity (Table S17).

Discussion
There is substantial evidence that immunity genes play direct and (less well-supported) indirect roles in mate choice across vertebrates (Kamiya et al. 2014). These roles could function to promote diverse immunological repertoires in mates and offspring to respond to diverse parasite attack. Yet nearly four decades of research into this phenomenon in humans has yielded puzzling results. Taking a quantitative meta-analysis approach to put these results in their evolutionary context, we sought to identify patterns of consistency across studies in humans and non-human primates. We found a suggestive trend for choice of

Accepted Article
This article is protected by copyright. All rights reserved.
MHC-diverse mates across primates and clear support in humans, but inconsistency for MHC-dissimilar mating preferences.

Humans select more MHC diverse mates
Overall, we found a systematic trend for primates and a significant association for humans to prefer more heterozygous mates at MHC sites. In humans, congruent to findings from a range of non-human vertebrates (Kamiya et al. 2014), we found stronger evidence for female preferences for MHC-diversity, and when multiple MHC classes and loci were considered together. This may indicate that females receive greater evolutionary fitness benefits than males from selecting more MHC diverse mates, and thus the ability to identify heterozygosity of potential mates is particularly important. Moderate phylogenetic heritability observed in our study implies there may be evolutionary constraints on the expression of mate choice for MHC heterozygous mates. This could be related to olfactory signaling potential that varies among species (Niimura 2009). Our findings also suggest that power to detect significant effects increases when using more MHC classes and loci-allowing for greater variation in individual diversity. Nonhuman primate studies did not use both MHC classes and they rarely used multiple loci in their mate choice research, so the comparatively weaker effect sizes could be due to the lower variation available compared to human studies. Additionally, non-human primate effect sizes measured mating outcomes where MHC mating effects, if existing, are likely to be smaller than those from more controlled human experimental studies that measured mating preferences. We did not find mean effect size differences between human odor and facial preference tests, suggesting that humans may be similarly sensitive to both odor and visual cues of diversity (Penn & Potts 1998 (Hamilton & Zuk 1982). Facial attractiveness may be a particularly effective indicator of health and condition, and of individual heterozygosity (Roberts et al. 2005b;Lie et al. 2008).

Accepted Article
This article is protected by copyright. All rights reserved.
To illustrate, the faces of mixed-ethnicity individuals, who tend to be more heterozygous than average, have been shown to be more attractive than single-ethnicity faces across genders and cultures (Rhodes et al. 2005;Lewis 2010;Little et al. 2012). In addition, work using facial image manipulations demonstrated that cues to heterozygosity are themselves attractive independent of other accompanying indicators of heterosis (Lewis 2010;Little et al. 2012). How much MHC heterozygosity would make an individual attractive? The majority of human studies (9/10) measured MHC diversity categorically, with individuals homozygous at one or more loci classified as "homozygous" and all others as "heterozygous".
Therefore, if increasingly higher levels of heterozygosity or if some optimal level of heterozygosity is favored remains to be determined. Furthermore, preferences for heterozygosity and dissimilarity need not be exclusive. A preference for both heterozygosity and some degree of similarity is possible (where for each level of similarity, relative heterozygotes are preferred), as demonstrated empirically and by models of correlations between heterozygosity and measures of genetic similarity (Roberts et al. 2005b;Roberts et al. 2006).

Mate choice for MHC-dissimilarity is not consistent
Non-human primates tend to consistently prefer MHC-dissimilar mates (Figure 4a), but our power to detect a significant average effect is limited by our small sample size (N=6) and by the addition of random effects to control for pseudoreplication. However, meta-analytic power analyses predict that 11 additional effect sizes (total N=17) should be sufficient to detect a biologically relevant effect (Zr = 0.15, explaining 2.25% variance) of MHC-dissimilarity on mating patterns at 80% power (Table S17). In contrast to the unidirectional trend for dissimilarity in non-human primates (posterior mean Zr (HPD) = 0.109 (-0.194 to 0.404)), humans show great variation in direction and magnitude of effect sizes ( Figure   4a). Directional variation could partly be explained by the unit of investigation; we found pairs had significantly stronger preferences for MHC similarity than males. Human males are thought to be less choosy than females since they tend to invest less in offspring, can reproduce at a faster rate, and have higher reproductive variance (Trivers 1972;Puts 2012). Thus, this significant contrast may reflect

Accepted Article
This article is protected by copyright. All rights reserved.
assortative mating in pairs versus male indifference. The contrast between pairs and females was not significant, as females had a greater range of preferences.
One interpretation is that pairs represent mate choice outcomes, in contrast to individual preferences of males and females. Because actual choice of partners is influenced by many socioeconomic conditions including ethnicity, nationality, family relatedness, phenotypic similarity, and spatial segregation (reviewed in Kalmijn 1998;Bovet et al. 2012;Nojo et al. 2012) that can result in genetically assortative pairings, the apparent preference for MHC-similarity in pairs may be coincidental. In effect, ethnic heterogeneity within a sample will produce patterns of positive assortative mating at the ethnic level, where spouses are more similar at a genome-wide level than random pairs of individuals (Chaix et al. 2008;. We found strong support for this explanation in our findings of significant effects for similarity within couples sampled from ethnically heterogeneous populations, but not from homogeneous populations or from experimental studies that control for potential ethnic biases The variation in human MHC dissimilarity preference but single-direction trend in non-human primates suggests that selection pressure for MHC dissimilar mates may be sensitive to environmental or demographic perturbations primarily affecting humans. In contrast to non-human primates, humans have undergone a recent population expansion (Kaessmann et al. 2001) and show extensive levels of admixture (Lawson et al. 2012), while most non-human primate populations are more genetically isolated and

Accepted Article
This article is protected by copyright. All rights reserved.
homogeneous, as are, for example, wild great apes (Prado-Martinez et al. 2013). Considering that mate choice relies on interpreting cues about traits relative to their background frequency, large changes in the genetic composition of the mating pool may distort signals. For example, if population-specific targets of optimal offspring diversity exist, mixing those populations would produce conflicting optimal targets.

Methodological differences among studies
Even considering only experimental studies with much greater control over the statistical design, there was still high heterogeneity among human dissimilarity effect size magnitude and direction (Table S3, Figure 3). It could be argued that differences among studies in methods or statistical design are responsible (Havlicek & Roberts 2009). However, the majority of experimental human studies used the same statistical design of Wedekind et al. (1995) which treated the chosen individual as the unit of analysis. Only two odor studies used different designs: one  repeated the analysis using a within-donor design and found that the analysis "yields virtually identical results" . The other (Santos et al. 2005) used a chi-squared design, but had also had a percentage of participants on birth control, so this effect size along with other effect sizes of pill-users was ultimately removed from the dataset and could not influence the results.
Sources of heterogeneity can arise through study design, how the outcome is measured, and through real biological differences between populations. Yet we emphasize that studies that differ in their methodologies can be combined for a meaningful meta-analysis. The meta-analysis of Kamiya et al.
(2014) is a case in point, which combined studies employing a diversity of methods across a range of vertebrates and found a clear general pattern for individuals to prefer MHC dissimilar and diverse mates.
In fact, combining research based on different methodologies insures that the variance in effect-size patterns reflects this process and not any one methodological artifact (Lajeunesse 2010). We can then explore the factors that contribute to variation in MHC mate choice research to synthesize discordant results. One main source of heterogeneity we detected with statistical support was from combining different ethnic groups in a sample for studies that compared observed pairs to randomly created pairs.

Accepted Article
This article is protected by copyright. All rights reserved.
Another methodological source of heterogeneity we tested was choice cue because preferences based on different stimuli could show different patterns, but we found no strong evidence of this and other studies have found positive correlation between facial and scent attractiveness (Thornhill & Gangestad 1999;Thornhill et al. 2003). Additional methodological differences between human studies and their implications for results have been thoroughly discussed elsewhere (Wedekind et al. 2002;Havlicek & Roberts 2009;Derti et al. 2010;Winternitz & Abbate 2015). We note that we cannot unanimously differentiate whether the differences between study outcomes are caused by methodology or biology, and it is likely that both types of mechanisms are in effect.

Study limitations
One potential source of type I (false positive) errors is biases in published effect sizes. Publication bias testing showed no evidence for p-hacking but did reveal that the average power of the studies analyzed was low, ranging between 24-57% between datasets, indicating that true effects may have gone undetected in those studies. Based on our mean effect sizes for dissimilarity (Zr=0.044) and diversity (Zr=0.153), we recommend study samples sizes of ~ 4051 and ~260 respectively, to detect true effects in primates 80% of the time. The magnitude of these mean effect sizes are typical for ecological data (Møller & Jennions 2002) but this translates to dissimilarity only explaining approximately 0.2% and diversity 2.3% of the variation in primate mating patterns. Clearly, MHC-mediated mate choice in humans and other primates is just one relatively small consideration of many involved in choosing a mate.
Another issue regarding type I error is multiple testing. In our study we used a large number of predictors, the majority being those from Kamiya et al. (2014) in order to get comparable results across different taxa (mammals, non-human primates and humans). We also performed a large number of tests for each MHC target. We appreciate that type I errors may occur when testing for multiple predictors, but highlight that the magnitude of the effects may be suggestive for designing future studies. It is reassuring to see that a significant variable in our study (choosy sex) was also found to be significant in a previous meta-analysis on MHC-diversity based mating in vertebrates (Kamiya et al. 2014).

Accepted Article
This article is protected by copyright. All rights reserved.
Regarding type II (false negative) errors, retrospective power-analysis of our own results indicates we have sufficient sample sizes to have enough power detect an effect of Zr = 0.15 (explaining ~2.2% variation in MHC mating patterns) for human MHC-diversity model and models of all human MHC dissimilarity choice cue categories excluding odor preference studies. Sample sizes for non-human primates were too low to have a good chance of detecting true effects, but high power could easily be achieved with less than 20 additional effect sizes. Noise in the data may increase the risk of not finding biological effects. Therefore, we have tried to draw attention to models where power was limited and new data would be very helpful (i.e., human odor preference models, non-human primate models), to help advance the field of MHC-linked mate choice.
Another challenge linked with interpreting meta-analyses conducted across heterogeneous samples and study designs is the issue of confounding factors. Confounding factors not accounted for in original studies pose a substantial problem for the interpretation of all meta-analytical approaches, as they can increase the risk of missing a true effect. Specifically, in our study, only four studies in our dataset had (i) controlled for Pill effects, (ii) used subjects of the same ethnicity, and (iii) conducted an experimental study to control for confounding factors. Two of these studies (Wedekind et al. 1995;Wedekind & Furi 1997) investigated odor preferences and found positive effects of MHC dissimilarity (r=0.3347, r=0.11, respectively). The two other studies (Roberts et al. 2005a, Roberts et al. 2005b investigated facial preferences and found negative effects of MHC dissimilarity (r= -0.2632, r= -0.2067).
Therefore, even studies that fulfill the strictest of conditions still find opposing effects of MHCdissimilarity on human mate preferences (which may have a biological explanation if the two modalities work in complementary ways to optimize level of MHC diversity for offspring (Roberts et al. 2005a)).
Our meta-analysis has shown that various moderators can impact the sizes and directions of results investigating MHC-linked primate mating and therefore can point to gaps in the research field to be addressed by future studies. Unfortunately, given our limited sample size, we were not able to incorporate multiple moderators within the same models. Thus, we cannot rule out a significant effect of MHC-

Accepted Article
This article is protected by copyright. All rights reserved. dissimilarity on primate mate choice under certain conditions and a definitive answer awaits further studies.

Evidence for mate choice for MHC-optimality?
Considering only experimental studies that could control for potential socio-ethnic assortative biases we still found wide variation in effect sizes for MHC-dissimilarity. It seems prudent to consider that an alternative biological explanation may be at least partially responsible for the variation. The optimality hypothesis predicts that direction of preference either for MHC dissimilarity or some degree of allele matching may depend on the relative allelic diversity in the pool of potential mates (Aeschlimann et al. 2003;Milinski 2006). For example, Aeschlimann et al. (2003) showed that sticklebacks preferred dissimilar partners in simulated inbred populations and optimally dissimilar partners in simulated outbred populations. High heterogeneity in MHC-dissimilar mating preferences could reflect differences among individuals attempting to achieve "optimal dissimilarity" by preferring similar or dissimilar mates depending on ecological and demographic context Milinski 2006;Roberts 2009).
Thus, MHC-based mate choice may be stronger or easier to detect in settings where there is less population genetic diversity and less heterogeneity in other factors which influence mate choice (Ober et al. 1997;Jacob et al. 2002;Chaix et al. 2008). Considering that the diversity of distinct HLA haplotypes (multi-locus set of linked alleles) per population typically ranges from 100s to 1000s (Gragert et al. 2013), individuals from isolated populations would most frequently encounter only a fraction of the diversity of haplotypes common in more outbred populations. For example, only 10 haplotypes make up the high frequency majority for the Hutterite community (N=1891 sampled) where a significant preference for MHC haplotype-dissimilar mating was detected (Ober et al. 1997). Matching at 6 alleles (for 6-locus haplotypes) could severely limit the potential antigenic detection range and/or interfere with maternal-fetal interactions (Ober et al. 1988;Ober et al. 1997;Lashley et al. 2015). The Hutterite results are in line with the hypothesis of mate selection disfavoring extreme MHC similarity (Derti et al. 2010), and our results also show that experimental study preferences for MHC-similar individuals had relatively

Accepted Article
This article is protected by copyright. All rights reserved.
few matching alleles between mates (average of 1-3 alleles). It would be helpful if tests for MHCdissimilarity preferences considered a higher range of potentially matching alleles (e.g., allele matching at 0-6 loci). Odor preference studies in particular could employ synthetic peptides that mimic individual alleles to have greater control of the range of allelic diversity (Milinski et al. 2005;Milinski et al. 2013).

Importance of direct and indirect fitness benefits
Our finding of greater mean effect size for MHC diversity compared to dissimilarity is in line with evidence from across all vertebrates (Zr = 0.113 vs Zr = 0.064, respectively, (Kamiya et al. 2014)).
Detecting MHC diversity in a mate may be easier than detecting dissimilarity, as diversity is expected to correlate positively with the mate's phenotypic condition, including health status  or perception of health status through skin condition (Roberts et al. 2005), body mass (Thoß et al. 2011), and coloration dependent on infection status (Milinski & Bakker 1990), among others. Dissimilarity, on the other hand, should not be reflected by the mate's phenotype alone, as it depends only on the combination of both mates genotypes. Thus, this type of preference would require more sophisticated sensory mechanisms including self-referential capabilities. The evolutionary benefits of human mate choice for MHC diversity may include prolonged parental care and reduced risk of contracting disease for a partner and the offspring (Roberts et al. 2005b), in addition to the potential indirect benefits from transmission of advantageous genes to offspring by diverse mates (Brown 1997(Brown , 1999Kempenaers 2007

Accepted Article
This article is protected by copyright. All rights reserved.
human primates. For instance, the expression of a major direct fitness benefit, paternal care, is intense for humans but rare in non-human primates (reviewed in Fernandez-Duque et al. 2009). This association suggests that more promiscuous mating systems which tend to provide fewer resource-based benefits to females (Clutton-Brock 1989) have weaker effects of mate choice for MHC-diversity.

Conclusions and suggestions for future research
We found clear support for humans and a trend for non-human primates choosing more MHC-diverse mates. In contrast, we found extremely high heterogeneity and no such clear pattern in humans for choice for MHC-dissimilar mates. A key driver of this heterogeneity was whether or not ethnic heterogeneity in studies on couples was controlled. High heterogeneity among non-human primate studies still showed a consistent direction for MHC dissimilarity, which could stem from methodological differences but also from socio-ecological differences between populations and species (Setchell & Huchard 2010). In fact, we found preliminary evidence that the expression of MHC-based mate choice could depend on the mating system and the reproductive strategies of individuals within those systems, as well as the phylogenetic history among species.
Results of this study show clear priorities in how to design future human and non-human primate studies, including studies that: (1) are large scale (>=200 individual targets) and include power analysis, (2) focus on individuals from (ethnically) homogeneous populations with limited MHC-diversity (Ober et al. 1997;Jacob et al. 2002;Chaix et al. 2008) and test socio-ecologically sensitive hypotheses (e.g., how expression of preferences for diversity/dissimilarity/optimality may vary according to mating system or demography; Setchell & Huchard 2010), (3) explicitly test the optimality hypothesis (with the prediction of less variance around an optimal parental combination of alleles than combination under random mating; Forsberg et al. 2007) and consider experimentally adjusting MHC peptide diversity/overlap (Milinski et al. 2013), (4) use multiple loci, including different MHC classes (I and II) (Kamiya et al. 2014), (5) control for non-MHC variability-key in determining incidental or adaptive MHC-assortative

Accepted Article
This article is protected by copyright. All rights reserved.
Finally, we emphatically call for more non-human primate studies to improve understanding of the evolutionary trajectory of human mate choice. We hope that our synthesis highlights the need for additional studies of the selective pressure of MHC genotype on mating decisions and provides direction for future research.

Data accessibility
The datasets compiled for our analyses and R code are deposited on Dryad (doi:10.5061/dryad.5003g). Tables   Table 1. Comparison of models with varying random effects and their deviance information criteria (DIC) values for effects of MHC dissimilarity and diversity on mate choice for all species, only humans, and only non-human primates. All models include the intercept and the listed random effects. Values in bold specify the random-effects used in subsequent models. Models within 2 DIC units are essentially equivalent (Spiegelhalter et al. 2002), so we chose to include random effects that would account for study and phylogenetic non-independence without overfitting the models.

Accepted Article
This article is protected by copyright. All rights reserved. Table 2. Heterogeneity estimates and deviance information criteria (DIC) for a set of random-effect only meta-analytical models for MHC dissimilarity and diversity. The heterogeneity (I 2 ) value is the percent variance from a particular random factor over the sum of all variance components plus the mean variance, and was calculated from posterior means. The total I 2 (and the 95% HPD) is the sum of all variance components. The mode total variance is shown for comparison (where similar values indicate stable models). Phylogenetic heritability (H 2 ) is the proportion of variance that can be explained by phylogenetic variance (Hadfield & Nakagawa 2010). The final random-effect models used in subsequent analyses are highlighted in bold. Rationale for using the bold models is explained in the methods. Model ID refers to models from Table 1.

Accepted Article
This article is protected by copyright. All rights reserved.  White boxes indicate model estimates from human data, black boxes are from non-human primate data, and gray polygons indicate estimates from models using all data. We removed pill-user effect sizes from the combined primate and human dataset to control for its potentially confounding effect (see methods), and tested pill-use as a moderator for studies that investigated its affect or controlled for it. The meta-analytic mean is from the interceptonly model run with study ID and phylogeny as random effects (and study ID as the random effect for the human-data model). Dashed boxes highlight mean estimates with HPD intervals that do not overlap zero.
Vertical bars indicate significant contrasts between moderator categories for humans and all data (*pvalue < 0.05, see Table S10).  Zr effect sizes for human data were investigated by choice cue category, including facial preferences (white/yellow diamond), odor preferences (dark gray/blue diamond), these experimental categories combined (light gray/green polygon), and mate choice studies (black/maroon square). Points show the mean posterior estimate from the model, and error bars represent the 95% HPD interval. Numbers on the right-hand side of each panel indicate the number of effect sizes in each subgroup. We ran all models excluding potentially confounding pill-use effect sizes, except those models specifically testing for pill-use effect (only

Accepted Article
This article is protected by copyright. All rights reserved.
available for odor preference studies). The cross-diamond model estimates include only studies that had dichotomously classified raters as pill-users or not (i.e., Santos at al. (2005) provided an effect size for a group of raters in which 30% were taking birth control pills). The meta-analytic mean is from the intercept-only model run with study ID as a random effect. Zr effect sizes and associated variance were extracted from each study for a) preference for MHCdissimilarity (positive Zr values indicate preference for dissimilarity, negative for similarity) and b) preference for MHC-diversity. Black diamonds indicate means from random effect (RE) models for humans and non-human primates presented separately, and for all data combined, and error bars represent the meta-analytic variance (1/(N study -3)). N is the number of independent targets per study. We investigated the effect of the moderator 'population' on the subset of Zr effect sizes that were negative (indicating preference for similarity). Models with the moderator 'population' were run between choice cue categories for combined experimental studies (odor and facial preference studies, light gray/green diamond) and for mate choice studies (black/maroon square). Points show the mean posterior estimate from the model, and error bars represent the 95% HPD interval. Numbers on the righthand side of each panel indicate the number of effect sizes in each subgroup. Effect sizes from ethnically heterogeneous populations for mate choice studies (but not experimental studies) had a significant posterior mean estimate for similarity (mean Zr (95%HPD) = -0.142 (-0.275 to -0.017)).