Data pre-processing to improve the mining of large feed databases - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Animal Année : 2013

Data pre-processing to improve the mining of large feed databases

F. F. Maroto-Molina
  • Fonction : Auteur
A. A. Gomez-Cabrera
  • Fonction : Auteur
J.E. J. Guerrero-Ginel
  • Fonction : Auteur
A. A. Garrido-Varo
  • Fonction : Auteur
Gilles G. Tran
  • Fonction : Auteur
Valérie V. Heuze
  • Fonction : Auteur
D.C. D. Pérez-Marin
  • Fonction : Auteur

Résumé

The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.

Dates et versions

hal-01000843 , version 1 (04-06-2014)

Identifiants

Citer

F. F. Maroto-Molina, A. A. Gomez-Cabrera, J.E. J. Guerrero-Ginel, A. A. Garrido-Varo, Daniel Sauvant, et al.. Data pre-processing to improve the mining of large feed databases. Animal, 2013, 7 (7), pp.1128-1136. ⟨10.1017/S1751731113000293⟩. ⟨hal-01000843⟩
39 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More