RiboDB database: a comprehensive resource for prokaryotic systematics

Ribosomal proteins (r-proteins) are increasingly used as an alternative to ribosomal rRNA for prokaryotic systematics. However, their routine use is difficult because r-proteins are often not or wrongly annotated in complete genome sequences, and there is currently no dedicated exhaustive database of r-proteins. RiboDB aims at fulfilling this gap. This weekly updated comprehensive database allows the fast and easy retrieval of r-protein sequences from publicly available complete prokaryotic genome sequences. The current version of RiboDB contains 90 r-proteins from 3,750 prokaryotic complete genomes encompassing 38 phyla/major classes and 1,759 different species. RiboDB is accessible at http://ribodb.univ-lyon1.fr and through ACNUC interfaces. Modern prokaryotic systematics relies mainly on the analysis of the RNA component of the small subunit of the ribosome (SSU-rRNA) and small sets of housekeeping genes (Ludwig and Klenk 2001). The resulting phylogenies have provided interesting but partial information on the evolutionary history of prokaryotes because the corresponding genes do not contain enough phylogenetic signal to resolve with confidence all the species relationships, especially at large and small evolutionary scales. The rise of genomics opens the way for the definition of a novel "gold standard" for prokaryotic systematics that may complement and eventually replace SSU-rRNA. In this respect , ribosomal proteins (r-proteins) were shown to be good candidates (Yutin et al. 2012; Ramulu et al. 2014) because they carry a phylogenetic signal that is consistent with that of larger gene sets such as core genes, while allowing tree inference in acceptable computation time. Moreover, r-proteins are believed to be the main informative molecules detected by MALDI-TOF mass spectrometry, a disruptive technology adopted recently for microbial identification in clinical laboratories (Welker and Moore 2011). The use of r-proteins for phylogenetics and systematics requires the accurate and exhaustive identification of r-proteins homo-logues. However, due to their small size and atypical amino acid composition, r-proteins are often not or badly annotated in genomic sequences (Yutin et al. 2012; Ramulu et al. 2014). This hampers their routine use for systematic purposes. In fact, despite a few attempts in the past 10 years (Nakao et al. 2004; Teeling and Gloeckner 2006), there is currently no up-to-date dedicated database of prokaryotic r-proteins. Here, we present RiboDB, a comprehensive database of pro-karyotic r-proteins built from the reannotation of complete genome sequences available in GenBank. RiboDB gathers sequences of all currently recognized r-protein families (90 r-families), with the exception of S1. In fact, the presence of many repeats of the S1 domain in this r-protein, a conserved domain occurring in a wide range of RNA associated proteins, prevents the reliable identification of S1 homologues. Homologues of each r-family are identified through a double approach combining reciprocal best-blast-hits (rBBH) and hidden Markov model (HMM) profiles (see Supplementary Materials online for details). A regularly updated knowledge database of manually curated r-proteins is used to select the most accurate seeds for rBBH searches and to build the HMM profiles. A scoring system is used to evaluate the accuracy of predicted r-proteins. This double approach was shown to generate very few false positives and false negatives. RiboDB relies on the ACNUC database system (Gouy and Delmotte 2008) and can be accessed through ACNUC interfaces , including the web-based WWW-Query available at http://doua.prabi.fr/search/query_fam. A user-friendly web-site is also available at http://ribodb.univ-lyon1.fr. Both allow the retrieval of r-protein sequences (both amino acids and nucleotides) for user-defined set of taxa and r-families in various formats (e.g., GenBank, fasta). Three output files are Brief communication

Mots clés

Phylogeny prokaryote systematics ribosomal protein taxonomy evolution

Domaines

Evolution [q-bio.PE]

Damien M. de Vienne : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01800061

Soumis le : vendredi 11 octobre 2019-10:54:07

Dernière modification le : mercredi 24 janvier 2024-03:33:34

Dates et versions

hal-01800061 , version 1 (11-10-2019)

Identifiants

HAL Id : hal-01800061 , version 1
DOI : 10.1093/molbev/msw088
PUBMED : 27189556

Citer

Frédéric Jauffrit, Simon Penel, Stéphane Delmotte, Carine Rey, Damien M de Vienne, et al.. RiboDB database: a comprehensive resource for prokaryotic systematics. Molecular Biology and Evolution, 2016, 33 (8), pp.2170--2172. ⟨10.1093/molbev/msw088⟩. ⟨hal-01800061⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LYON1 BIOENVIS LBBE UDL ANR

97 Consultations

0 Téléchargements