CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases - Archive ouverte HAL Accéder directement au contenu
Poster De Conférence Année : 2022

CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases

Résumé

Protein domains can be viewed as building blocks, essential for understanding structure-function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, boundaries between different domains or families differ from one domain database to the other, raising the question of domain definition and enumeration. The answer to this question cannot be found in a single database. Rather, expert integration and curation of various databases are required to refine the contours of a domain of interest, in a domain-centric approach. Here, we illustrate the role of 3-D structure in clarifying domain definition with the help of CroMaSt: “Cross-Mapper for Structural Domains”, a fully automated workflow that classifies all structural instances of a given domain into 3 different categories (core, true and domain-like). CroMaSt is developed in Common Workflow Language (CWL) and takes advantage of 2 well-known and widely used domain databases, Pfam (sequence-based) and CATH (structure-based). It uses the domain definitions from Pfam and CATH and SIFTS resource for cross-mapping of structural instances from the above-mentioned sources. Structural alignments generated by Kpax allow to identify the false positive instances from each domain database. We tested CroMaSt on the RNA Recognition Motif (RRM), the most prevalent and diverse RNA-binding domain. Starting from PF00076 and 3.30.70.330 domain families from Pfam and CATH respectively, our workflow identifies 882 core, 966 true and 344 domain-like structural instances. The information generated by this method will play a crucial role in machine learning methods applied to domain-specific synthetic biology.
Fichier principal
Vignette du fichier
CroMaSt_ECCB_RNAct_final.pdf (1.43 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03789541 , version 1 (27-09-2022)

Identifiants

Citer

Hrishikesh Dhondge, Isaure Chauvot de Beauchêne, Marie-Dominique Devignes. CroMaSt: A workflow for domain family curation through cross-mapping of structural instances between protein domain databases. ECCB2022- 21st European Conference on Computational Biology, Sep 2022, Sitges, Spain. ⟨10.48546/WORKFLOWHUB.WORKFLOW.390.1⟩. ⟨hal-03789541⟩
83 Consultations
18 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More