Data pre-processing to improve the mining of large feed databases

F. F. Maroto-Molina; A. A. Gomez-Cabrera; J.E. J. Guerrero-Ginel; A. A. Garrido-Varo; Daniel Sauvant; Gilles G. Tran; Valérie V. Heuze; D.C. D. Pérez-Marin

doi:10.1017/S1751731113000293

Article Dans Une Revue Animal Année : 2013

Data pre-processing to improve the mining of large feed databases

, , , , (1) , , ,

F. F. Maroto-Molina

Fonction : Auteur

A. A. Gomez-Cabrera

Fonction : Auteur

J.E. J. Guerrero-Ginel

Fonction : Auteur

A. A. Garrido-Varo

Fonction : Auteur

Daniel Sauvant

Fonction : Auteur
PersonId : 176543
IdHAL : daniel-sauvant
IdRef : 07736516X

Modélisation Systémique Appliquée aux Ruminants

Gilles G. Tran

Fonction : Auteur

Valérie V. Heuze

Fonction : Auteur

D.C. D. Pérez-Marin

Fonction : Auteur

Résumé

The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.

Mots clés

chemical composition data integration outlier mining nutritive value

Domaines

Sciences agricoles

Archive Ouverte ProdInra : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01000843

Soumis le : mercredi 4 juin 2014-13:45:55

Dernière modification le : mardi 10 octobre 2023-16:38:08

Dates et versions

hal-01000843 , version 1 (04-06-2014)

Identifiants

HAL Id : hal-01000843 , version 1
DOI : 10.1017/S1751731113000293
PRODINRA : 200709
WOS : 000319604800010

Citer

F. F. Maroto-Molina, A. A. Gomez-Cabrera, J.E. J. Guerrero-Ginel, A. A. Garrido-Varo, Daniel Sauvant, et al.. Data pre-processing to improve the mining of large feed databases. Animal, 2013, 7 (7), pp.1128-1136. ⟨10.1017/S1751731113000293⟩. ⟨hal-01000843⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH INRA MOSAR INRAE ARINRAE-ANIMAL ARINRAE PHASE

39 Consultations

0 Téléchargements

Data pre-processing to improve the mining of large feed databases

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager