Missing data estimation in morphometrics: how much is too much? - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Systematic Biology Année : 2014

Missing data estimation in morphometrics: how much is too much?

Résumé

Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically-controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last years, several empirically-determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on various properties of the study data set and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation (MI) techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation-Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. MI techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of MIs with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.

Dates et versions

hal-00993597 , version 1 (20-05-2014)

Identifiants

Citer

Julien Clavel, Gildas Merceron, Gilles Escarguel. Missing data estimation in morphometrics: how much is too much?. Systematic Biology, 2014, 63 (2), pp.203-18. ⟨10.1093/sysbio/syt100⟩. ⟨hal-00993597⟩
87 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More