vALId: validation of protein sequence quality based on multiple alignment data. - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Bioinformatics and Computational Biology Année : 2005

vALId: validation of protein sequence quality based on multiple alignment data.

Résumé

The validation of sequences is essential to perform accurate phylogeny and structure/function analysis. However among the thousands of protein sequences available in the public databases, most have been predicted in silico and have not systematically undergone a quality verification. It has recently become evident that they often contain sequence errors. To address the problem of automatic protein quality control, we have developed vALId, an interactive web interfaced software. Taking advantage of high quality multiple alignments of complete protein sequences (MACS), vALId first warns about the presence of suspicious insertions, deletions (indels) and divergent segments, and second, proposes corrections based on transcripts and genome contigs. In a first evaluation test, hundreds of indels and divergent segments were randomly generated in a manually refined MACS. The sensitivity (Sn) and specificity (Sp) of indel detection were excellent (0.96) while the mean Sn(0.49) and Sp(0.56) of divergent segment delineation depended on the percent identity between sequence neighbors. In a second test, 6195 sequences in 100 MACS corresponding to different functional and structural protein families were analyzed. 65% of the sequences were in silico predictions and 44% of eukaryote predicted proteins were partially incorrect with at least one suspicious indel or divergent segment.
Fichier non déposé

Dates et versions

hal-00187446 , version 1 (14-11-2007)

Identifiants

  • HAL Id : hal-00187446 , version 1
  • PUBMED : 16078368

Citer

Laurent Bianchetti, Julie Dawn Thompson, Odile Lecompte, Frederic Plewniak, Olivier Poch. vALId: validation of protein sequence quality based on multiple alignment data.. Journal of Bioinformatics and Computational Biology, 2005, 3 (4), pp.929-47. ⟨hal-00187446⟩
80 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More