On the diversity of pattern distributions in rational language

Cyril Banderier 1, * Olivier Bodini 1, * Yann Ponty 2, 3, * Hanane Tafat 1
* Auteur correspondant
3 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
Abstract : It is well known that, under some aperiodicity and irreducibility conditions, the number of occurrences of local patterns within a Markov chain (and, more generally, within the languages generated by weighted regular expressions/automata) follows a Gaussian distribu- tion with both variance and mean in (n). By contrast, when these conditions no longer hold, it has been denoted that the limiting distribution may follow a whole diversity of distributions, including the uniform, power-law or even multimodal distribution, arising as tradeo s between structural properties of the regular expression and the weight/probabilities associated with its transitions/letters. However these cases only partially cover the full diversity of behaviors induced within regular expressions, and a characterization of attainable distributions remained to be provided. In this article, we constructively show that the limiting distribution of the simplest foresee- able motif (a single letter!) may already follow an arbitrarily complex continuous distribution (or cadlag process). We also give applications in random generation (Boltzmann sampling) and bioinformatics (parsimonious segmentation of DNA).
Type de document :
Communication dans un congrès
ANALCO - 12th Meeting on Analytic Algorithmics and Combinatorics - 2012, Jan 2012, Kyoto, Japan. Omnipress, pp.107--116, 2012
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00643598
Contributeur : Yann Ponty <>
Soumis le : mercredi 15 février 2012 - 02:19:55
Dernière modification le : jeudi 7 février 2019 - 14:49:58
Document(s) archivé(s) le : jeudi 14 juin 2012 - 16:27:22

Fichiers

analco_soda.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00643598, version 1

Collections

Citation

Cyril Banderier, Olivier Bodini, Yann Ponty, Hanane Tafat. On the diversity of pattern distributions in rational language. ANALCO - 12th Meeting on Analytic Algorithmics and Combinatorics - 2012, Jan 2012, Kyoto, Japan. Omnipress, pp.107--116, 2012. 〈hal-00643598〉

Partager

Métriques

Consultations de la notice

799

Téléchargements de fichiers

420