TCS: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction
Résumé
Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated dataset and a novel empirical yeast dataset. For this purpose, we describe a novel lossless alternative to site filtering that involves over-weighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS to HoT, GUIDANCE, Gblocks and trimAl and found it to lead to significantly better estimate of structural accuracy as well as more accurate phylogenetic trees. Availability: TCS is part of the T-Coffee package, a freeware open source code can be downloaded from http://www.tcoffee.org/Packages/Stable/Latest and a web server is also available from http://tcoffee.crg.cat/tcs.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...