Skip to Main content Skip to Navigation
Journal articles

RiboDB database: a comprehensive resource for prokaryotic systematics

Abstract : Ribosomal proteins (r-proteins) are increasingly used as an alternative to ribosomal rRNA for prokaryotic systematics. However, their routine use is difficult because r-proteins are often not or wrongly annotated in complete genome sequences, and there is currently no dedicated exhaustive database of r-proteins. RiboDB aims at fulfilling this gap. This weekly updated comprehensive database allows the fast and easy retrieval of r-protein sequences from publicly available complete prokaryotic genome sequences. The current version of RiboDB contains 90 r-proteins from 3,750 prokaryotic complete genomes encompassing 38 phyla/major classes and 1,759 different species. RiboDB is accessible at http://ribodb.univ-lyon1.fr and through ACNUC interfaces. Modern prokaryotic systematics relies mainly on the analysis of the RNA component of the small subunit of the ribosome (SSU-rRNA) and small sets of housekeeping genes (Ludwig and Klenk 2001). The resulting phylogenies have provided interesting but partial information on the evolutionary history of prokaryotes because the corresponding genes do not contain enough phylogenetic signal to resolve with confidence all the species relationships, especially at large and small evolutionary scales. The rise of genomics opens the way for the definition of a novel "gold standard" for prokaryotic systematics that may complement and eventually replace SSU-rRNA. In this respect , ribosomal proteins (r-proteins) were shown to be good candidates (Yutin et al. 2012; Ramulu et al. 2014) because they carry a phylogenetic signal that is consistent with that of larger gene sets such as core genes, while allowing tree inference in acceptable computation time. Moreover, r-proteins are believed to be the main informative molecules detected by MALDI-TOF mass spectrometry, a disruptive technology adopted recently for microbial identification in clinical laboratories (Welker and Moore 2011). The use of r-proteins for phylogenetics and systematics requires the accurate and exhaustive identification of r-proteins homo-logues. However, due to their small size and atypical amino acid composition, r-proteins are often not or badly annotated in genomic sequences (Yutin et al. 2012; Ramulu et al. 2014). This hampers their routine use for systematic purposes. In fact, despite a few attempts in the past 10 years (Nakao et al. 2004; Teeling and Gloeckner 2006), there is currently no up-to-date dedicated database of prokaryotic r-proteins. Here, we present RiboDB, a comprehensive database of pro-karyotic r-proteins built from the reannotation of complete genome sequences available in GenBank. RiboDB gathers sequences of all currently recognized r-protein families (90 r-families), with the exception of S1. In fact, the presence of many repeats of the S1 domain in this r-protein, a conserved domain occurring in a wide range of RNA associated proteins, prevents the reliable identification of S1 homologues. Homologues of each r-family are identified through a double approach combining reciprocal best-blast-hits (rBBH) and hidden Markov model (HMM) profiles (see Supplementary Materials online for details). A regularly updated knowledge database of manually curated r-proteins is used to select the most accurate seeds for rBBH searches and to build the HMM profiles. A scoring system is used to evaluate the accuracy of predicted r-proteins. This double approach was shown to generate very few false positives and false negatives. RiboDB relies on the ACNUC database system (Gouy and Delmotte 2008) and can be accessed through ACNUC interfaces , including the web-based WWW-Query available at http://doua.prabi.fr/search/query_fam. A user-friendly web-site is also available at http://ribodb.univ-lyon1.fr. Both allow the retrieval of r-protein sequences (both amino acids and nucleotides) for user-defined set of taxa and r-families in various formats (e.g., GenBank, fasta). Three output files are Brief communication
Document type :
Journal articles
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01800061
Contributor : Damien M. de Vienne Connect in order to contact the contributor
Submitted on : Friday, October 11, 2019 - 10:54:07 AM
Last modification on : Monday, October 4, 2021 - 2:52:05 PM

Links full text

Identifiers

Collections

Citation

Frédéric Jauffrit, Simon Penel, Stéphane Delmotte, Carine Rey, Damien de Vienne, et al.. RiboDB database: a comprehensive resource for prokaryotic systematics. Molecular Biology and Evolution, Oxford University Press (OUP), 2016, 33 (8), pp.2170--2172. ⟨10.1093/molbev/msw088⟩. ⟨hal-01800061⟩

Share

Metrics

Record views

183