| HAL: hal-00452701, version 1 |
| Detailed view | Export this paper |
|
|
| 5th International Colloquium on Mathematics and Computer Science (MathInfo'08), Blaubeuren : Germany (2008) |
|
|
|
|
| Constructions for Clumps Statistics. |
|
|
| Frédérique Bassino 1Julien Clément 2 |
|
|
| (2008-09-01) |
|
|
| We consider a component of the word statistics known as clump; starting from a finite set of words, clumps are maximal overlapping sets of these occurrences. This object has first been studied by Schbath [22] with the aim of counting the number of occurrences of words in random texts. Later work with similar probabilistic approach used the Chen-Stein approximation for a compound Poisson distribution, where the number of clumps follows a law close to Poisson. Presently there is no combinatorial counterpart to this approach, and we fill the gap here. We also provide a construction for the yet unsolved problem of clumps of an arbitrary finite set of words. In contrast with the probabilistic approach which only provides asymptotic results, the combinatorial method provides exact results that are useful when considering short sequences. |
|
|
|
|
|
|
|
|
|
|
| 1: | Laboratoire d'informatique de Paris-nord (LIPN) |
| CNRS : UMR7030 – Université Paris XIII - Paris Nord | |
| 2: | Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen (GREYC) |
| CNRS : UMR6072 – Université de Caen Basse-Normandie – Ecole Nationale Supérieure d'Ingénieurs de Caen | |
| 3: | Laboratoire d'informatique de l'école polytechnique (LIX) |
| CNRS : UMR7161 – Polytechnique - X | |
|
|
|
|
|
|
|
|
| Subject | : | Computer Science/Data Structures and Algorithms Computer Science/Discrete Mathematics Mathematics/Combinatorics |
|
|
| Words counting – formal language decomposition – generating functions – automata |
|
|
| Attached file list to this document: | |||||
|
|
|
| hal-00452701, version 1 | |
| http://hal.archives-ouvertes.fr/hal-00452701 | |
| oai:hal.archives-ouvertes.fr:hal-00452701 | |
| From: Frédérique Bassino | |
| Submitted on: Tuesday, 2 February 2010 18:52:27 | |
| Updated on: Tuesday, 2 February 2010 20:46:30 | |