Skip to Main content Skip to Navigation
Journal articles

A complementary circular code in the protein coding genes

Abstract : Recently, shifted periodicities 1 module 3 and 2 module 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arqubs et al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,...,TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5'-3' direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can be identified in the three frames. This approach is applied in the two gene populations. Unexpectedly, the same three subsets of trinucleotides are identified in these two gene populations: T-o=X(o) boolean OR{AAA,TTT} with X(o) = {AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG,GAA,GAC GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC} in frame 0, T-1 = X(1) boolean OR {CCC} in frame 1 and T-2 = X(2) boolean OR {GGG} in frame 2, each subset X(0), X(1) and X(2) having 20 trinucleotides. Surprisingly, these three subsets have five important properties: (i) the property of maximal circular code for X(0) (resp. X(1),X(2)) allowing the automatical retrieval of frame 0 (resp. 1, 2) in any region of a protein gene model (formed by a series of trinucleotides of Xo) without using a start codon; (ii) the DNA complementarity property C (e.g. C(AAC)= GTT): C(T-0) = T-0, C(T-1) = T-2 and C(T-2) = T-1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property P (e.g. P(AAC) = ACA): P(X(0)) = X(1) and P(X(1)) = X(2) implying that the two subsets X(1) and X(2) can be deduced from X(0); (iv) the rarity property with an occurrence probability of X(0) equal to 6 x 10(-8); and (v) the concatenation property with: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to automatically retrieve the frame and an occurrence of the four types of nucleotides in the three trinucleotides sites, in favour of an evolutionary code. In the Discussion, the identified subsets T-0, T-1 and T-2 replaced in the three two-letter genetic alphabets purine/pyrimidine, amino/ceto and strong/weak interaction, allow us to deduce that the RNY model (R = purine = A of G, Y = pyrimidine = C or T, N = R or Y) (Eigen & Schuster, 1978) is the closest two-letter codon model to the trinucleotides of To. Then, these three subsets are related to the genetic code. The trinucleotides of T-0 code for 13 amino acids: Ala, Asn, Asp, Gin, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of T-0 in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by T-0, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes. (C) 1996 Academic Press Limited
Document type :
Journal articles
Complete list of metadata
Contributor : Admin Ligm Connect in order to contact the contributor
Submitted on : Wednesday, May 2, 2012 - 5:33:52 PM
Last modification on : Thursday, September 29, 2022 - 2:21:15 PM

Links full text



Didier Arquès, Cj Michel. A complementary circular code in the protein coding genes. Journal of Theoretical Biology, Elsevier, 1996, 182 (1), pp.45--58. ⟨10.1006/jtbi.1996.0142⟩. ⟨hal-00693509⟩



Record views