Efficient estimation of the cardinality of large data sets

Philippe Chassaing; Lucas Gerin

Communication Dans Un Congrès Année : 2006

Efficient estimation of the cardinality of large data sets

(1) , (1)

Philippe Chassaing

Fonction : Auteur
PersonId : 7545
IdHAL : philippe-chassaing
IdRef : 060774274

Institut Élie Cartan de Nancy

Lucas Gerin

Fonction : Auteur
PersonId : 835101

Institut Élie Cartan de Nancy

Résumé

F.Giroire has recently proposed an algorithm which returns the approximate number of distincts elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Mots clés

cardinality large multiset approximate counting data stream algorithms

Domaines

Probabilités [math.PR]

Fichier principal

EfficientEstimation.pdf (178.17 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Lucas Gerin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00095370

Soumis le : vendredi 22 avril 2011-11:06:51

Dernière modification le : mercredi 3 avril 2024-13:54:02

Archivage à long terme le : samedi 23 juillet 2011-02:36:24

Dates et versions

hal-00095370 , version 1 (12-01-2007)

hal-00095370 , version 2 (28-08-2007)

hal-00095370 , version 3 (29-08-2007)

hal-00095370 , version 4 (22-04-2011)

hal-00095370 , version 5 (17-08-2015)

Licence

Paternité

Identifiants

HAL Id : hal-00095370 , version 4
ARXIV : math/0701347

Citer

Philippe Chassaing, Lucas Gerin. Efficient estimation of the cardinality of large data sets. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 2006, Nancy, France. pp.419-422. ⟨hal-00095370v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

653 Consultations

770 Téléchargements

Efficient estimation of the cardinality of large data sets

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Altmetric

Partager