Revisiting Waiting Times in DNA Evolution
Résumé
Transcription factors are short stretches of DNA (or k-mers) mainly located in promoters sequences that enhance or repress gene expression. With respect to an initial distribution of letters on the DNA alphabet, Behrens and Vingron (2010) consider a random sequence of length n that does not contain a given k-mer or word of size k. Under an evolution model of the DNA, they compute the probability p_n that this k-mer appears after a unit time of 20 years. They prove that the wait- ing time for the first apparition of the k-mer is well approxim- ated by Tn = 1/pn . Their work relies on the simplifying as- sumption that the k-mer is not self-overlapping. Behrens et al. (2012) use an approach by automata that relaxes the assumption related to words overlaps. Our new approach of the problem by clump analysis and generating functions explains the quasi-linear behaviour of pn observed for a large range of values of n. We present here this clump analysis, by language decomposition, and by an automaton construction.
Domaines
Théorie et langage formel [cs.FL]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...