Compositionally constrained sites drive long branch attraction - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

Compositionally constrained sites drive long branch attraction

Lenard L Szantho
  • Fonction : Auteur
Gergely Szöllősi
  • Fonction : Auteur
  • PersonId : 1159507
Dominik Schrempf
  • Fonction : Auteur
  • PersonId : 1159506

Résumé

Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional het- erogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acid along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce “compositional constraint analysis”, a method to investigate the effect of site-specific amino acid diversity on phylogenetic inference, and show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signal for models ignoring across-site compositional heterogeneity. We demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-PMSF, a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. Our analyses indicate that CAT-PMSF is unbiased. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositional constrained sites are driving long branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals.
Fichier principal
Vignette du fichier
2022.03.03.482715v1.full.pdf (947.73 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03764562 , version 1 (30-08-2022)

Identifiants

Citer

Lenard L Szantho, Nicolas Lartillot, Gergely Szöllősi, Dominik Schrempf. Compositionally constrained sites drive long branch attraction. 2022. ⟨hal-03764562⟩
54 Consultations
42 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More