Computing the Rao's distance between negative binomial distributions. Application to Exploratory Data Analysis

Abstract : The statistical analysis of counts of living organisms brings information about the collective behavior of species (schooling, habitat preference, etc), possibly depending on their biological characteristics (growth rate, reproductive power, survival rate, etc). This task can be implemented in a non-parametric setting, but parametric distributions, such as the negative binomial (NB) distributions studied here, are also very useful for modeling populations abundance. Nevertheless , the parametric approach is ill-suited from an exploratory point of view, because the visual distance between parameters is irrelevant. On the contrary, considering the Riemannian manifold N B(D R) of NB distributions equipped with the Rao metrics D R , one can compute intrinsic distances between species which can be considered as absolute. Unfortunately, computing this distance requires solving a second-order nonlinear dierential equation, whose solution cannot be always found in an acceptable length of time with enough precision. While Manté and Kidé [1] proposed numerical remedies to these problem, we propose a geometrical one, based on Poisson approximation. It consists in superseding A and/or B by "equivalent" better-suited distribution(s) before computing the distance, insofar as possible. The proposed method is illustrated by displaying distributions of counts of marine species: these counts having been tted by NB distributions, we compute the distance table ∆ between species and represent ∆ through multidimensional scaling (MDS). Poisson approximation, Multidimensional scaling Notations Consider a Riemannian manifold M, and a parametric curve α : [a, b] → M. Its rst derivative will be denoted ˙ α. A geodesic curve γ connecting two points p and q of M will be denoted p q, and p s ⊕ s q will denote the broken geodesic [2] connecting p to q with a stopover at s. We will also consider for any θ ∈ M the local norm V g (θ) associated with the metrics g on the tangent space T θ M : ∀ V ∈ T θ M, V g (θ) := V t .g(θ).V. (1) The length of a curve α traced on M will be denoted Λ (α). A parametric probability distribution L i will be identied with its coordinates with respect to some chosen parametrization; for instance, we will write L i ≡ φ i , µ i for some negative binomial distribution. In addition, R + * := ]0, +∞[, and M F will denote the Frobenius norm of the matrix M ; logical propositions will be combined by using the classical connectors ∨ (or) and ∧ (and).
Keywords :
Document type :
Preprints, Working Papers, ...
Domain :

Cited literature [41 references]

https://hal.archives-ouvertes.fr/hal-02130199
Contributor : Claude Manté <>
Submitted on : Wednesday, May 15, 2019 - 4:13:46 PM
Last modification on : Thursday, June 13, 2019 - 12:40:28 PM

File

JMVA_MantePartIMain.pdf
Files produced by the author(s)

Identifiers

• HAL Id : hal-02130199, version 1

Citation

Claude Manté. Computing the Rao's distance between negative binomial distributions. Application to Exploratory Data Analysis. 2019. ⟨hal-02130199⟩

Record views