Skip to Main content Skip to Navigation
Book sections

Note sur l'approximation de la loi hypergéométrique par la formule de Muller

Abstract : The argument which is developed here starts from the computation of the probability that a word will be absent from an exhaustive random sample drawn from a corpus whose complete frequency distribution is known. This probability is the basis of the formula put forward, more than 20 years ago, by C. Muller. Muller's formula is compared here to its equivalent in the hypergeometric model. Two studies were carried out: first the computation of vocabulary increase in corpuses and, secondly, the comparison between Muller's values and averages obtained by drawing a large number of random samples from several corpuses. It is thus demonstrated that this formula is a good approximation of the hypergeometric law. The need for associating standard deviations to the computed values is also emphasised since confidence levels have to be taken into account.
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00758060
Contributor : Dominique Labbé Connect in order to contact the contributor
Submitted on : Wednesday, November 28, 2012 - 8:52:36 AM
Last modification on : Friday, December 3, 2021 - 3:40:25 AM
Long-term archiving on: : Saturday, December 17, 2016 - 4:28:27 PM

File

HubertLabbA_1988a.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00758060, version 1

Collections

CNRS | INSMI | UGA

Citation

Pierre Hubert, Dominique Labbé. Note sur l'approximation de la loi hypergéométrique par la formule de Muller. Dominique Labbé, Philippe Thoiron, Daniel Serant. Etudes sur la richesse et la structures lexicales, Slatkine-Champion, pp.77-91, 1988. ⟨hal-00758060⟩

Share

Metrics

Record views

704

Files downloads

1737