Proteins as Functional Units of Biocalcification – An Overview

High-throughput approaches such as genomics, transcriptomics and proteomics have led to the discovery of a larger set of biomineralization genes than previously foreseen. These gene lists are often difficult to decode in light of the current models of calcification. Here we overview the proteins available in UniProt (Universal Protein Resource), that were identified directly in metazoan calcium carbonate mineralized structures or known to have direct key-functions in calcification processes. Functional annotation of the protein datasets using Gene Ontology reveals that functions like carbohydrate binding, structural and catalytic activities (e.g. hydrolase) are commonly represented across the organic matrices.


Introduction
In the metazoan world, many organisms synthesize mineralized structures of calcium carbonate (CaCO 3 ) through a physiological process called biocalcification. A significant number of calcifying species lives in the aquatic environment and belong to distinct groups including sponges, corals, brachiopods, mollusks, crustaceans, echinoderms and chordates. A common feature among these groups of organisms is the production of CaCO 3 -mineralized structures by a biologically controlled mechanism. Indeed, this process is highly regulated at all stages of biomineral formation [1]. In particular, it requires: (1) the delineation of the mineralizing space by cell membranes or polymers; (2) the formation of an array of macromolecules and its subsequent targeting to the site of mineralization; (3) the pumping of inorganic ionic precursors to set up a saturated environment; and finally (4), the control of certain proteins over crystal nucleation, growth and inhibition, providing a scaffold for mineral deposition. Although biologically controlled mineralization is present in microorganisms such as bacteria (magnetotactic [2]) and unicellular algae (diatoms [3], coccolithophores [4]) or protozoans (ciliates [5]), the process is undeniably more broadly represented in animals. Non-vertebrates in particular can produce a wide-range of external CaCO 3biominerals such as shells (in mollusks and in brachiopods), carapaces (in crustaceans), exoskeletons (in corals and sponges) and calcified tubes (in serpulid annelids), but also internal ones like spicules and spines (in sponges and echinoderms, respectively) and gastroliths (in crustaceans). In vertebrates, CaCO 3 -mineralized tissues are less common. The main examples refer to structures like fish otoliths [6] and otoconia [7], which primary function is to provide sense of gravity, balance, movement and direction, and eggshells that provide an outer hard protection to avian eggs [8]. It is common to describe biominerals as organo-mineral assemblages where the mineral, the major component, is embedded in a minor organic matrix. This latter is often a complex mixture of macromolecules -proteins, glycoproteins, polysaccharides and lipids -produced by the organisms and directly involved in the regulation of the mineral deposition. In most cases, the macromolecules are occluded within the CaCO 3 during its growth, constituting the "organic fraction" of the biomineral or the more generally called organic matrix (OM). Although there is still much to study about the exact mechanisms by which the components of the OM, in particular the proteins, control the formation of the calcified tissues, some key features of these proteins are recognized as being functionally important. The first common trait is the high content in aspartic acid and, in a lesser degree, glutamic acid. This feature is frequently observed in proteins of mollusk shells and coral exoskeletons [9]. These proteins are recognized to have strong inhibitory function on the crystal growth and they are suspected to control the crystal morphology by selective adsorption on mineral surfaces [9][10][11][12], due to their multiple negative charges. Another interesting feature of OM proteins are the post-translational modifications, in particular, glycosylations and phosphorylations, which can greatly contribute to the polyanionicity of such proteins [13][14][15][16]. Besides glycoproteins, the OM contains polysaccharides, which are important for structuring [17] the organic-mineral framework but also for interacting directly with the crystal surfaces [13]. Enzymes have also been identified in organic matrices, the best example being the carbonic anhydrase. The function of this family of proteins is to catalyze the reversible hydration of CO 2 , forming one bicarbonate ion and one proton, according to the following reaction: H 2 O + CO 2 -> H 2 CO 3 -> HCO 3 -+ H + . Since the first report by Miyamoto and co-workers of Nacrein -a modular protein with two carbonic anhydrase domains intercalated by an acidic domain -from the mollusk shell of the pearl oyster Pinctada fucata [18], many proteins in shells and other structures were identified by a one-per-one approach using classical molecular biology and biochemistry techniques. Until recently, the existing information was manageable and consisting of some proteins with very specific signatures: secretion signals, low complexity regions, repeats and some predictable domains (acidic, carbonic anhydrase, chitin-binding). The application of high-throughput techniques, genomics, transcriptomics and proteomics has changed the perspective, by revealing a much wider range of proteins that may be taking part in the control of biocalcification [19][20][21][22]. This clearly suggests broader functions for the OM proteins such as cross-communication between the mineral-producing tissue and the extracellular calcifying matrix, which go far beyond the catalysis of crystal nucleation and the guiding of mineral growth via binding to specific crystal surfaces. Here we present an overview of the proteins that have been identified in CaCO 3 -mineralized tissues of some metazoan groups currently available on UniprotKB [23,24]. We characterize this protein dataset from a functional viewpoint by using the Gene Ontology (GO) terms associated to the sequences [25].

Functional analysis of CaCO 3 -biomineralization proteins: data gathering
Keyword searches related to CaCO 3 -biomineralization processes were performed against the UniProtKB on March 2013 with taxonomic filters for metazoan non-vertebrate aquatic calcifying phyla: Porifera, Cnidaria, Brachiopoda, Mollusca, Echinodermata. This output was manually confirmed in order to have a representative set of reference proteins ( Table 1). The most represented group is by far Mollusca due to their economical interest as food source and as 'providers' of cultured pearls. Consequently, these last years have seen an increased application of highthroughput approaches to molluscan tissues in combination with available genomes [26]. Because of the importance of reef-ecosystems in connection to environmental concerns at global scale -such as ocean acidification -, more sequences related to calcification have recently been identified from coral skeletons [19,22]. However the number of sequences available for these organisms is still low. Also for echinoderms the number of confirmed proteins remains limited [27]. Finally, sponges and brachiopods are not popular models in biomineralization studies, exhibiting the smallest publication rate in the field, particularly when regarding molecular characterizations of the organic matrix [28,29]. They consequently remain the groups with the lowest number of identified proteins involved in calcification. Each set of proteins was characterized for their content in GO terms belonging to the categories -Molecular Function (MF), Cellular Component (CC) and Biological Process (BP). We selected the top 5 GO terms of each group with more than one hit ( Fig.1 and Fig. 3). Biological Process (BP) terms were only included in mollusks (Fig. 2) as only this group of aquatic organisms had specific ontologies of the biomineralization process (shell calcification, GO:0031215). GO terms provide good predictions of protein function, and more broadly of the whole organic matrix, since a good share of the proteins listed in Table 1 are associated to ontologies. Fig. 1 summarizes the top GO terms, in specific categories, associated to shell matrix proteins in mollusks. The main protein functions relate to the catalytic activity of carbonic anhydrases (e.g. nacreins) and hydrolases, many types of binding functions like metal ion binding, carbohydrate and polysaccharide (e.g. perlucin, lustrin A) and enzyme regulatory activity (e.g. the proteins PSPI1 and NSPI1, containing protease inhibitor domains).

Biomineralization: From Fundamentals to Biomaterials & Environmental Issues
As discussed in earlier works [21,22,30], because the biomineral formation takes place extracellularly for each of the considered phylum, they are expected to have extracellular region and proteinaceous extracellular matrix as the main GO terms associated to protein localization.

M F: binding (GO:00005488)
enzyme inhibitor activity (17) metalloendopeptidase inhibitor activity (2) peptidase regulator activity ( In corals and echinoderms, top molecular functions (MF, Fig. 3) are enzymatic and binding/structural. There is equivalent distribution of protein localization regarding the extracellular matrix environment and the cell membrane. This suggests a higher content of transmembrane proteins [22,31] in the OM of these organisms than in the OM of mollusks, for which the proportion of proteins targeted to the extracellular space is exceptionally high. These differences may reflect fundamental differences in the molecular process of biomineral formation in corals and echinoderms, on the one hand, and in molluscs on the other hand: while the first ones may involve both extracellular proteins and proteins with transmembrane domains -a fact that was evidenced for corals [22], extracellular proteins are the dominant constituents of shell matrices in mollusks.

Conclusions
The analysis of gene ontology terms associated to biomineralization proteins brings an additional resource of information to characterize the organic matrix from a functional viewpoint. Indeed automatic annotations cover a significant number of the proteins considered in this study -50% in mollusks, 70% in corals and 83 % in echinoderms. Still it is important to alert those dealing with the discovery of novel proteins to the necessity of providing detailed information on the experimental characterization upon submission of the nucleotide and protein sequences to generalist databases (e.g. DDBJ/EMBL/GenBank, UniProtKB). Only with the help of the bench scientist, it is possible to ensure a faster update process and accurate information on sequence data immediately available to the community. Indeed, there are only a few biomineralization-specific ontologies, which include otolith and eggshell formation in vertebrates (e.g., otolith morphogenesis (GO:0032474), otolith mineralization (GO:0045299), eggshell formation (GO:0030703)), but also shell calcification in mollusks (GO:0031215) and chitin-based cuticle sclerotization in crustaceans (GO:0036340). In addition GO vocabularies specifically developed for CaCO 3 -biomineralization in other organisms are scarce. The community who studies organic matrix molecules can also contribute to the enrichment of the currently available vocabulary by suggesting new terms via the GO curators. These initiatives will greatly contribute to faster identification of biomineralization proteins in silico for non-model organisms and lead to better-automated annotations.
To conclude, the analysis of GO terms contributes to a general overview of the proteins associated to CaCO 3 -tissues. Fig. 4 summarizes the main types of molecules in terms of the primary structure, localization and possible role in calcification.