Checking homogeneity of patterns distribution in heterogeneous sequences
Résumé
Studying the distribution of a motif along sequences may help in the understanding of its biological function, or to detect regions of interest. A statistical model is needed to assess the significance of the observed distribution. We propose a heterogenous compound Poisson process to model the possibility of overlap between occurrences and some heterogeneity of the sequence known a priori. The estimation procedure of the parameters is described and tests of homogenous sub-models are proposed. We also consider the detection of rich regions using either cumulated distances or moving intervals, via a homogenization technique. Illustrations of the method are given with applications to bacterial genomes.