https://hal.archives-ouvertes.fr/hal-00693990Arquès, DidierDidierArquèsLIGM - Laboratoire d'Informatique Gaspard-Monge - UPEM - Université Paris-Est Marne-la-Vallée - ENPC - École des Ponts ParisTech - ESIEE Paris - Fédération de Recherche Bézout - CNRS - Centre National de la Recherche ScientifiqueFallot, JpJpFallotMichel, CjCjMichelAn evolutionary model of a complementary circular codeHAL CCSD1997Ligm, Admin2012-05-03 11:44:222022-09-29 14:21:152012-05-03 11:44:22enJournal articles10.1006/jtbi.1996.03051The subset X(q) = {AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG,GAA,GAC,GAG,GAT,GCC,GGC GGT,GTA,GTC,GTT,TAC,TTC} of 20 trinucleotides has a preferential occurrence in frame 0 (a reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X(0) has the rarity property (6 x 10(-8)) to be a complementary maximal circular code with two permutated maximal circular codes X(1) and X(2) in frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). X(0) is called a C-3 code. A quantitative study of these three subsets X(0), X(1) and X(2) in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of X(0), X(1) and X(2) in frame 0 of the eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5' and 3' regions of eukaryotes where X(0), X(1) and X(2) occur with variable frequencies around the random value (1/3). Several frequency asymmetries unexpectedly observed, e.g. the frequency difference between X(1) and X(2) in the frame 0, are related to a new property of the C-3 code X(0) involving substitutions. An evolutionary model at three parameters (p, q, k) based on an independent mixing of the 20 codons (trinucieotides in frame 0) of X(0) with equiprobability (1/20) followed by k approximate to 5 substitutions per codon in the three codon sites in proportions p approximate to 0.1, q approximate to 0.1 and r = 1 - p - q approximate to 0.8 respectively, retrieves the frequencies of X(0), X(1) and X(2) observed in the three flames of protein genes and explains these asymmetries. (C) 1997 Academic Press Limited.