Handling of missing data to improve the mining of large feed databases - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Animal Science Année : 2013

Handling of missing data to improve the mining of large feed databases

Résumé

Feed databases often have missing data. Despite their potentially major effect on data analysis (e.g., as a source of biased results and loss of statistical power), database managers and nutrition researchers have paid little attention to missing data. This study evaluated various methods of handling missing data using mining outputs from a database containing data on chemical composition and nutritive value for 18,864 alfalfa samples. A complete reference dataset was obtained comprising the 2,303 cases with no missing data for the attributes CP, crude fi ber (CF), NDF, ADF and ADL. This dataset was used to simulate 2 types of missing data (at random and not at random), each with 2 loss intensities (33 and 66%), thus yielding a total of 4 incomplete datasets. Missing data from these datasets were handled using 2 deletion methods and 4 imputation methods, and outputs in terms of the identifi cation and typing of alfalfa (using ANOVA and descriptive statistics) and of correlations between attributes (using regressions) were compared with outputs from the complete dataset. Imputation methods, particularly model-based versions, were found to perform better than deletion methods in terms of maximizing information use and minimizing bias although the extent of differences between methods depended on the type of missing data. The best approximation to the uncertainty value was provided by multiple imputation methods. It was concluded that the choice of the most suitable method for handling missing data depended both on the type of missing data and on the purpose of data analysis.
Fichier non déposé

Dates et versions

hal-01019073 , version 1 (07-07-2014)

Identifiants

Citer

F. Maroto-Molina, A. Gomez-Cabrera, J.E. Guerrero-Ginel, A. Garrido-Varo, Daniel Sauvant, et al.. Handling of missing data to improve the mining of large feed databases. Journal of Animal Science, 2013, 91 (1), pp.491-500. ⟨10.2527/jas.2012-5491⟩. ⟨hal-01019073⟩
92 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More