Skip to Main content Skip to Navigation
Journal articles

Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony

Abstract : The rate of evolution of macromolecules such as ribosomal RNAs and proteins varies along the molecule because structural and functional constraints differ between sites. Many studies have shown that ignoring this variation in computing evolutionary distances leads to severe underestimation of sequence divergences, and thus can lead to misleading evolutionary tree inferences. We propose here a new parsimony-based method for computing evolutionary distances between pairs of sequences that takes into account this variation and estimates it from the data. This method applies to the number of substitutions per site in ribosomal RNA genes as well as to the number of nonsynonymous substitutions per codon for protein-coding genes and is especially suitable when large data sets (> or = 100 sequences) are analyzed. First, starting from a phylogeny constructed with usual distances, the maximum-parsimony method is used to infer the distribution of the number of substitutions that have occurred at each site (or codon) along this tree. This distribution is then fitted to an "invariant + truncated negative binomial" distribution that allows for invariant sites. Maximum-likelihood fitting of this distribution to different data sets showed that it agreed very well with real data. Noticeably, allowing for invariant sites seemed to be very important. Finally, two distance estimates were developed by introducing the distribution of site variability into the substitution models of Jukes and Cantor and of Kimura. The use of different numbers of aligned sequences (up to 1,000 rRNA sequences) showed that the parameters of the model are very sensitive to the number of sequences used to estimate them. However, if at least 100 sequences are considered, the two new distance estimates are quite stable with respect to the number of sequences used to fit the distribution. This stability is true for low as well as for high evolutionary distances. These new distances appeared to be much better estimates of the number of substitutions per site than the classical distances of Jukes and Cantor and of Kimura, which both greatly underestimate this number, so that they can serve as indexes to detect saturation. We conclude that the new distances are particularly suitable for phylogenetic analysis when very distantly related species and relatively large data sets are considered. Trees reconstructed using these distances are generally different from those constructed by means of the classical estimates. Using this new method, we showed that the mean evolutionary distance between Prokaryotes and Eukaryotes is substantially higher for the small-subunit than for the large-subunit rRNAs. This suggests than the former might have experienced a drastic change during the early evolution of Eukaryotes.
Document type :
Journal articles
Complete list of metadata
Contributor : Stéphane Delmotte Connect in order to contact the contributor
Submitted on : Monday, November 23, 2009 - 2:52:18 PM
Last modification on : Monday, October 4, 2021 - 2:52:05 PM

Links full text




N.J. Tourasse, Manolo Gouy. Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony. Molecular Biology and Evolution, Oxford University Press (OUP), 1997, 14 (3), pp.287-298. ⟨10.1093/oxfordjournals.molbev.a025764⟩. ⟨hal-00434995⟩



Record views