| HAL : hal-00095370, version 4 |
| arXiv : math/0701347 |
| Fiche détaillée | Récupérer au format |
|
|
| 4th Colloquium on Mathematics and Computer Science, France (2006) |
|
|
| Versions disponibles : | v1 (12-01-2007) | v2 (28-08-2007) | v3 (29-08-2007) | v4 (22-04-2011) |
|
|
|
|
| Efficient estimation of the cardinality of large data sets |
|
|
| Philippe Chassaing 1Lucas Gerin 1 |
|
|
| (2006) |
|
|
| F.Giroire has recently proposed an algorithm which returns the approximate number of distincts elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory. |
|
|
|
|
|
|
|
|
|
|
| 1 : | Institut Elie Cartan Nancy (IECN) |
| CNRS : UMR7502 – INRIA – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine | |
|
|
|
|
|
|
|
|
| Probabilités et statistiques |
|
|
|
|
| Domaine | : | Mathématiques/Probabilités |
|
|
| cardinality – large multiset – approximate counting – data stream algorithms |
|
|
| Liste des fichiers attachés à ce document : | ||||||||||
|
|
|
| hal-00095370, version 4 | |
| http://hal.archives-ouvertes.fr/hal-00095370 | |
| oai:hal.archives-ouvertes.fr:hal-00095370 | |
| Contributeur : Lucas Gerin | |
| Soumis le : Vendredi 22 Avril 2011, 11:06:51 | |
| Dernière modification le : Mardi 26 Avril 2011, 11:15:29 | |