Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Compositionally constrained sites drive long branch attraction

Abstract : Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional het- erogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acid along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce “compositional constraint analysis”, a method to investigate the effect of site-specific amino acid diversity on phylogenetic inference, and show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signal for models ignoring across-site compositional heterogeneity. We demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-PMSF, a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. Our analyses indicate that CAT-PMSF is unbiased. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositional constrained sites are driving long branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03764562
Contributor : Nicolas Lartillot Connect in order to contact the contributor
Submitted on : Tuesday, August 30, 2022 - 1:28:22 PM
Last modification on : Saturday, September 24, 2022 - 2:36:04 PM

File

2022.03.03.482715v1.full.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Lenard L Szantho, Nicolas Lartillot, Gergely Szöllősi, Dominik Schrempf. Compositionally constrained sites drive long branch attraction. 2022. ⟨hal-03764562⟩

Share

Metrics

Record views

25

Files downloads

0