DOMO: a new database of aligned protein domains
Résumé
Domains are autonomously folding units which are combined into modular proteins 1. At a sequence level, accurately delineating the boundaries of homologous protein domains is essential for multiple sequence alignment. Tertiary structural data that could guide visual determination of such domain boundaries are not available for most proteins. Consequently, although many motif 2 , block 3 , and full-sequence-alignment 4 databases exist, as yet there are only two domain-alignment databases that have been constructed by a fully automated process utilizing only sequence information 5,6. Here, we describe DOMO, a new database containing 8877 multiple sequence alignments, including 99 058 protein domains as well as repeating-sequence regions extracted from 83 054 non-redundant amino acid sequences from the SWISS-PROT 7 and PIR 8 databases. The domain boundaries and alignments were generated by a fully automated analysis process that involves the detection and clustering of amino acid sequence similarities and, subsequently, delineation of the domain boundaries and multiple sequence alignment of related protein segments 9,10. The domain boundaries were not inferred from three-dimensional data. Instead, the relative positions of homologous segment pairs within the same protein (for repeats) or within homologous proteins with regard to each protein's Nor C-terminus were used to define the domain boundaries. The completeness and accuracy of the protein classifications, the correctness of the domain boundaries, and the quality of the multiple sequence alignments are greatly improved in DOMO, in comparison to other databases 9,10.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...